From 11f5329fc24898a1370841a5540b80539d5892e5 Mon Sep 17 00:00:00 2001 From: gitee-bot Date: Mon, 2 Mar 2026 14:36:48 +0000 Subject: [PATCH] Add README.md --- README.en.md | 122 +++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 247 insertions(+) create mode 100644 README.en.md create mode 100644 README.md diff --git a/README.en.md b/README.en.md new file mode 100644 index 0000000..88519e9 --- /dev/null +++ b/README.en.md @@ -0,0 +1,122 @@ +# MyKB + +Personal Knowledge Base and Research Tools Collection + +## Project Overview + +MyKB is a comprehensive personal knowledge management system encompassing large language model (LLM) learning materials, research tools, document conversion utilities, and quantitative research相关内容. This project aims to provide researchers and developers with efficient knowledge management and auxiliary tools. + +## Project Structure + +``` +mykb/ +├── convert_md_to_docx.py # Markdown to DOCX converter +├── generate_diagrams.py # Diagram generation tool +├── main.py # Main program entry point +├── pyproject.toml # Project configuration +├── llm/ # LLM learning materials and timeline +│ ├── llm-timeline.html # LLM development timeline visualization +│ ├── llm-timeline.png +│ ├── timeline.html # Interactive timeline +│ └── ppt_content_v2.md # PPT content for LLM research practice course +├── quant/ # Quantitative research materials +│ └── research_report.md # Agent engineering research report +├── tools/ # Utility scripts +│ └── fix_encoding.py # File encoding repair tool +└── configs/ # Configuration files + ├── gb_t_9704.json + ├── nsfc_format.json + └── research_institute.json +``` + +## Features + +### Document Conversion (convert_md_to_docx.py) + +Converts Markdown documents into formatted Word documents, supporting: + +- Style configuration (font, size, color, alignment, etc.) +- Page settings (headers, footers, page numbers) +- Cover page generation +- Table parsing and conversion +- Code block insertion +- Image insertion with captions + +### Diagram Generation (generate_diagrams.py) + +Generates research-oriented diagrams using Matplotlib: + +- Rounded box diagrams +- Pyramid diagrams +- Data model diagrams +- Label pipeline diagrams +- Shadow pattern diagrams + +### LLM Timeline + +Includes an interactive HTML timeline visualization showcasing the evolution and key milestones of large language models. + +### Encoding Repair Tool (fix_encoding.py) + +Repairs corrupted or incorrectly encoded text files, supporting: + +- Byte frequency analysis +- Common Chinese, Japanese, and Korean encoding fixes +- Uniform line ending normalization +- UTF-8 validation + +## Configuration Files + +| File | Purpose | +|------|---------| +| `gb_t_9704.json` | National Standard format configuration | +| `nsfc_format.json` | National Natural Science Foundation of China format | +| `research_institute.json` | Research institute format configuration | + +## Dependencies + +The project is developed in Python and primarily depends on: + +- `python-docx` - Word document processing +- `matplotlib` - Diagram generation +- Other standard library modules + +## Usage + +### Markdown to DOCX Conversion + +```bash +python convert_md_to_docx.py input.md output.docx +``` + +### Generate Diagrams + +```bash +python generate_diagrams.py +``` + +### Fix Encoding + +```bash +python tools/fix_encoding.py +``` + +## Research Content + +### Large Model Research Practice Course + +`llm/ppt_content_v2.md` contains complete course materials covering: + +1. **The Prompt Era** - ICL and Prompt Engineering +2. **Black-box Mechanisms** - Attention mechanisms, Token principles, origins of hallucinations +3. **RAG Technology** - Practical implementation of Retrieval-Augmented Generation +4. **Reasoning and Tools** - Function Calling, Multimodal systems +5. **Agents and Workflows** - MCP, automated pipelines + +### Agent Engineering Research Report + +`quant/research_report.md` analyzes minimalist Agent engineering practices from pi-mono to OpenClaw. + +## License + +This project is intended for personal learning and research purposes only. \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..27b09cd --- /dev/null +++ b/README.md @@ -0,0 +1,125 @@ + + + +# MyKB + +个人知识库与科研工具集 + +## 项目简介 + +MyKB 是一个综合性的个人知识管理系统,涵盖大语言模型(LLM)学习资料、科研工具、文档转换功能以及量化研究相关内容。该项目旨在为科研工作者和开发者提供高效的知识管理与辅助工具。 + +## 项目结构 + +``` +mykb/ +├── convert_md_to_docx.py # Markdown 转 DOCX 转换器 +├── generate_diagrams.py # 图表生成工具 +├── main.py # 主程序入口 +├── pyproject.toml # 项目配置 +├── llm/ # LLM 学习资料与时间线 +│ ├── llm-timeline.html # LLM 发展时间线可视化 +│ ├── llm-timeline.png +│ ├── timeline.html # 交互式时间线 +│ └── ppt_content_v2.md # 大模型科研实践课程PPT +├── quant/ # 量化研究资料 +│ └── research_report.md # Agent工程研究报告 +├── tools/ # 工具脚本 +│ └── fix_encoding.py # 文件编码修复工具 +└── configs/ # 配置文件 + ├── gb_t_9704.json + ├── nsfc_format.json + └── research_institute.json +``` + +## 功能说明 + +### 文档转换 (convert_md_to_docx.py) + +将 Markdown 文档转换为格式化的 Word 文档,支持: + +- 样式配置(字体、字号、颜色、对齐等) +- 页面设置(页眉页脚、页码) +- 封面页生成 +- 表格解析与转换 +- 代码块插入 +- 图片插入与题注 + +### 图表生成 (generate_diagrams.py) + +使用 Matplotlib 生成科研图表: + +- 圆角框图 +- 金字塔图 +- 数据模型图 +- 标签管道图 +- 阴影模式图 + +### LLM 时间线 + +包含交互式 HTML 时间线可视化,展示大语言模型的发展历程和关键里程碑。 + +### 编码修复工具 (fix_encoding.py) + +用于修复损坏或编码错误的文本文件,支持: + +- 字节频率分析 +- 常见中日韩字符编码修复 +- 行尾换行符统一 +- UTF-8 验证 + +## 配置文件说明 + +| 文件 | 用途 | +|------|------| +| `gb_t_9704.json` | 国标格式配置 | +| `nsfc_format.json` | 国家自然科学基金格式 | +| `research_institute.json` | 研究院格式配置 | + +## 依赖 + +项目使用 Python 开发,主要依赖包括: + +- `python-docx` - Word 文档处理 +- `matplotlib` - 图表绘制 +- 其他标准库模块 + +## 使用方法 + +### Markdown 转 DOCX + +```bash +python convert_md_to_docx.py input.md output.docx +``` + +### 生成图表 + +```bash +python generate_diagrams.py +``` + +### 修复编码 + +```bash +python tools/fix_encoding.py +``` + +## 研究内容 + +### 大模型科研实践课程 + +`llm/ppt_content_v2.md` 包含完整的培训课程内容,涵盖: + +1. **Prompt 时代** - ICL 与提示词工程 +2. **黑盒机制** - 注意力机制、Token 原理、幻觉根源 +3. **RAG 技术** - 检索增强生成实战 +4. **推理与工具** - Function Calling、多模态 +5. **Agent 与工作流** - MCP、自动化流水线 + +### Agent 工程研究报告 + +`quant/research_report.md` 分析了从 pi-mono 到 OpenClaw 的 Agent 工程极简主义实践。 + +## 许可证 + +本项目为个人学习与研究使用。 \ No newline at end of file -- Gitee