# SE16_WordCount

**Repository Path**: panbocodebase/SE16_WordCount

## Basic Information

- **Project Name**: SE16_WordCount
- **Description**: 个人编程练习
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 34
- **Created**: 2018-09-27
- **Last Updated**: 2021-11-02

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## 一、目的
+ 熟悉编程风格
+ 自学简单 Python 代码
+ 简单性能测试

## 二、编程

### 1. Task 1： Fork 项目SE16_WordCount，并创建自己的分支。

 + Fork码云的项目地址： https://gitee.com/ntucs/SE16_WordCount.git
 + 创建一个分支，以SE+学号后三位为分支名（如SE001），
 + 切换到自己分支下，编写项目代码（这里是文件 ：word_freq.py），
   + 注意： 完成代码编写后，要提交，并 push 到远端仓库

### 2. Task 2： 完成以下 Python 代码，实现对输入文件的词频统计

  +  说明：文件 word_freq.py 实现对一个文本文件的词频统计功能。
  +  用法： $ python  word_freq.py filename.txt 
  +  输出： 上述 文件filename.txt 中排名前十的单词。

``` 
# filename： word_freq.py
#  注意：代码风格

from string import punctuation

def process_file(dst):     # 读文件到缓冲区
    try:     # 打开文件
        _________（1）_________
    except IOError, s:
        print s
        return None
    try:     # 读文件到缓冲区
        _________（2）_________
    except:
        print "Read File Error!"
        return None
    ________（3）__________
    return bvffer

def process_buffer(bvffer):
    if bvffer:
        word_freq = {}
        # 下面添加处理缓冲区 bvffer代码，统计每个单词的频率，存放在字典word_freq
        __________________
        __________________
        _______（4）______
        __________________
        __________________
        return word_freq

def output_result(word_freq):
    if word_freq:
        sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True)
        for item in sorted_word_freq[:10]:  # 输出 Top 10 的单词
            print item

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('dst')
    args = parser.parse_args()
    dst = args.dst
    bvffer = process_file(dst)
    word_freq = process_buffer(bvffer)
    output_result(word_freq)

```

### 3. Task 3：  运行代码，截图保存结果写到博客中
  -  测试数据下载：大文件[Gone_with_the_wind](https://files.cnblogs.com/files/juking/Gone_with_the_wind.rar) 或 小文件[A_Tale_of_Two_Cities](https://files.cnblogs.com/files/juking/A_Tale_of_Two_Cities.rar)；压缩文件，下载后请解压使用。
  - 示例：
``` Python
python word_freq.py Gone_with_the_wind.txt
```

### 4. Task 4： 简单性能分析

  - 实验: 性能评估--词频统计软件：基于Python 编写： word_freq.py

  - 使用 cProfile 进行性能分析。

  - 用法：

``` Python
python -m cProfile word_freq.py filescounted.txt [| grep word_freq.py]
```

  - 例如： 统计《飘》-*Gone with the wind* 的词频
 
``` Python
 python -m cProfile word_freq.py Gone_with_the_wind.txt | grep word_freq.py
```

  - 指出寻找执行时间、次数最多的部分代码，尝试改进。

    PS： 能够改进 4分，只进行性能评估（2分）


## 三、 博客撰写（10分）

###（1）程序分析，对程序中的四个函数做简要说明 （3分）
 +  附上每一段的程序代码，及对应的说明；

###（2）代码风格说明。  （2分）
+  网上检索 Python 代码风格规范，选取其中一条，对照程序 word_freq.py 的对应部分加以说明。
+  Python 代码强调变量命名要*****，例如程序中第 **~ ** 行 代码：

       ```
        print "Read File Error!"  # 这里只是举例
       ``` 

###（3）程序运行命令、运行结果截图 （2分）

###（4）性能分析结果及改进  （3分）
+  指出寻找执行时间、次数最多的部分代码；（1分）
+  尝试改进程序代码 （2分）