# Reinforcement-Learning-Study-Note

**Repository Path**: CTC_Gitee/Reinforcement-Learning-Study-Note

## Basic Information

- **Project Name**: Reinforcement-Learning-Study-Note
- **Description**: 来自https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note.git
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-02-09
- **Last Updated**: 2026-02-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Reinforcement-Learning-Study-Note
这是我的强化学习笔记
<br>This is my reinforcement learning study note
<br>Non-CS background for both Bachelor's and Master's. Studying Reinforcement Learning myself. Feel free to comment and exchange ideas. Thank you for your criticism and correction!
## Reference 参考资料
| 序号 | 资料名称 | 链接 |
|---|---|---|
| 1 | 吴恩达机器学习 | [视频](https://www.bilibili.com/video/BV1owrpYKEtP/?spm_id_from=333.337.search-card.all.click&vd_source=fea2c3c140631e78a73b6d714dcf9f71) |
| 2 | 西湖大学赵世钰《强化学习的数学原理》 | [视频](https://www.bilibili.com/video/BV1sd4y167NS/?spm_id_from=333.337.search-card.all.click&vd_source=fea2c3c140631e78a73b6d714dcf9f71) [资料](https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning) |
| 3 | 上海交通大学张伟楠《动手学强化学习》 | [资料](https://hrl.boyuai.com/chapter/intro) |
| 4 | 蘑菇书强化学习 | [资料](https://datawhalechina.github.io/easy-rl/#/) |
## Contents 笔记目录
| 章节 | 内容 | 参考资料 | 链接 |
| ---- | ---- | ---- | ---- |
| 第一章 | Basic | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter01) |
| 第二章 | Dynamic Programming |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter02Dynamic-Programming) |
| 第三章 | Monte Carlo |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter03Monte-Carlo) |
| 第四章 | Stochastic Approximation |  |  |
| 第五章 | Temporal Difference |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter05Temporal-Difference) |
| 第六章 | Dyna-Q |  |  |
| 第七章 | DQN | | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter07DQN) |
| 第八章 | Improved DQN |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter08Improved-DQN) |
| 第九章 | Policy Gradient |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter09Policy-Gradient) |
| 第十章 | Actor-Critic |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter10Actor-Critic) |
| 第十一章 | Trust Region Policy Optimization |  |  |
| 第十二章 | Proximal Policy Optimization |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter12PPO) |
| 第十三章 | Deep Deterministic Policy Gradient |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter13DDPG) |
| 第十四章 | Soft Actor-Critic |  |  |
| 第十五章 | Imitation Learning |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter15Imitation%20Learning) |
| 第十六章 | Model Predicted Control |  | [笔记](https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main/Chapter16MPC) |