# Reinforcement-Learning-Study-Note **Repository Path**: CTC_Gitee/Reinforcement-Learning-Study-Note ## Basic Information - **Project Name**: Reinforcement-Learning-Study-Note - **Description**: 来自https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note.git - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-09 - **Last Updated**: 2026-02-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Reinforcement-Learning-Study-Note 这是我的强化学习笔记
This is my reinforcement learning study note
Non-CS background for both Bachelor's and Master's. Studying Reinforcement Learning myself. Feel free to comment and exchange ideas. Thank you for your criticism and correction! ## Reference 参考资料 | 序号 | 资料名称 | 链接 | |---|---|---| | 1 | 吴恩达机器学习 | [视频](https://www.bilibili.com/video/BV1owrpYKEtP/?spm_id_from=333.337.search-card.all.click&vd_source=fea2c3c140631e78a73b6d714dcf9f71) | | 2 | 西湖大学赵世钰《强化学习的数学原理》 | [视频](https://www.bilibili.com/video/BV1sd4y167NS/?spm_id_from=333.337.search-card.all.click&vd_source=fea2c3c140631e78a73b6d714dcf9f71) [资料](https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning) | | 3 | 上海交通大学张伟楠《动手学强化学习》 | [资料](https://hrl.boyuai.com/chapter/intro) | | 4 | 蘑菇书强化学习 | [资料](https://datawhalechina.github.io/easy-rl/#/) | ## Contents 笔记目录 | 章节 | 内容 | 参考资料 | 链接 | | ---- | ---- | ---- | ---- | | 第一章 | Basic | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter01) | | 第二章 | Dynamic Programming | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter02Dynamic-Programming) | | 第三章 | Monte Carlo | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter03Monte-Carlo) | | 第四章 | Stochastic Approximation | | | | 第五章 | Temporal Difference | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter05Temporal-Difference) | | 第六章 | Dyna-Q | | | | 第七章 | DQN | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter07DQN) | | 第八章 | Improved DQN | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter08Improved-DQN) | | 第九章 | Policy Gradient | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter09Policy-Gradient) | | 第十章 | Actor-Critic | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter10Actor-Critic) | | 第十一章 | Trust Region Policy Optimization | | | | 第十二章 | Proximal Policy Optimization | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter12PPO) | | 第十三章 | Deep Deterministic Policy Gradient | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter13DDPG) | | 第十四章 | Soft Actor-Critic | | | | 第十五章 | Imitation Learning | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter15Imitation%20Learning) | | 第十六章 | Model Predicted Control | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter16MPC) |