# TextEditApp **Repository Path**: zhxteam/text-edit-app ## Basic Information - **Project Name**: TextEditApp - **Description**: winform文本编辑器(算法实训大作业) - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-07-27 - **Last Updated**: 2022-07-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TextEditApp #### 介绍 2021暑期算法实训.NET winform文本编辑器 #### 算法 ##### 1.字符串匹配 1. KMP 2. Boyer–Moore(好坏字符串) 3. Rabin-Karp(是暴力算法的改进,先对比哈希值再对比字符串,Go 1.1的indexof 源码就是使用RK算法,Go现在是1.16版,下面是我查的最新源码,仍然用了RK算法) ```go // Index returns the index of the first instance of substr in s, or -1 if substr is not present in s. func Index(s, substr string) int { n := len(substr) switch { case n == 0: return 0 case n == 1: return IndexByte(s, substr[0]) case n == len(s): if substr == s { return 0 } return -1 case n > len(s): return -1 case n <= bytealg.MaxLen: // Use brute force when s and substr both are small if len(s) <= bytealg.MaxBruteForce { return bytealg.IndexString(s, substr) } c0 := substr[0] c1 := substr[1] i := 0 t := len(s) - n + 1 fails := 0 for i < t { if s[i] != c0 { // IndexByte is faster than bytealg.IndexString, so use it as long as // we're not getting lots of false positives. o := IndexByte(s[i+1:t], c0) if o < 0 { return -1 } i += o + 1 } if s[i+1] == c1 && s[i:i+n] == substr { return i } fails++ i++ // Switch to bytealg.IndexString when IndexByte produces too many false positives. if fails > bytealg.Cutover(i) { r := bytealg.IndexString(s[i:], substr) if r >= 0 { return r + i } return -1 } } return -1 } c0 := substr[0] c1 := substr[1] i := 0 t := len(s) - n + 1 fails := 0 for i < t { if s[i] != c0 { o := IndexByte(s[i+1:t], c0) if o < 0 { return -1 } i += o + 1 } if s[i+1] == c1 && s[i:i+n] == substr { return i } i++ fails++ if fails >= 4+i>>4 && i < t { // See comment in ../bytes/bytes.go. j := bytealg.IndexRabinKarp(s[i:], substr) if j < 0 { return -1 } return i + j } } return -1 } // IndexRabinKarp uses the Rabin-Karp search algorithm to return the index of the // first occurrence of substr in s, or -1 if not present. func IndexRabinKarp(s, substr string) int { // Rabin-Karp search hashss, pow := HashStr(substr) n := len(substr) var h uint32 for i := 0; i < n; i++ { h = h*PrimeRK + uint32(s[i]) } if h == hashss && s[:n] == substr { return 0 } for i := n; i < len(s); { h *= PrimeRK h += uint32(s[i]) h -= pow * uint32(s[i-n]) i++ if h == hashss && s[i-n:i] == substr { return i - n } } return -1 } ``` 4. Sunday(作者叫sunday哈哈,类似BM算法) 5. AC自动机(1975年来自贝尔实验室) 6. two-way算法(glibc库strstr的算法,空间复杂度O(1)) 7. horspool(Horspool 字符串匹配算法对Boyer-Moore算法的简化算法。 Horspool 算法是一种基于后缀匹配的方法,是一种“跳跃式”匹配算法,具有**sub-linear亚线性时间复杂度**。) #### 开发说明 拖下仓库以后,双击.sln文件 #### 接口 > 有更改可能 ##### 1. 字典查询 ```c# using TextEditorI.Algorithm; //导入类 ... SimpleDictionary d = SimpleDictionary.GetDictionary; //使用范例,注意这个不是函数,是反射功能 Console.WriteLine(d.isEnglish("Apple"));//True Console.WriteLine(d.isEnglish("Banana"));//True Console.WriteLine(d.isEnglish("??"));//False ``` ##### 2.统计词频 ```c# using TextEditorI.Algorithm; //导入类 ... var map = WordFrequency.GetWordFrequency(text,d);//其中d是字典,也就是说你需要给他一本词典才能进行统计,具体获取方法看API1,text是string类型 ``` ##### 3.文档分割 ```c# using TextEditorI.Algorithm; //导入类 ... string text; string[] words = Util.GetWordArray(text); ``` #### 性能 字典约45000词 文本约23000词 ![img](images/clip_image002.jpg) 本机i7 8550u 8g 2400hz 仅字典占用 ![image-20210727165813265](images/image-20210727165813265.png) 字典加文本哈希的内存 ![image-20210727165648227](images/image-20210727165648227.png) 原始内存占用 ![image-20210727165616733](images/image-20210727165616733.png) #### 实现计划 1. 统计词频等题目本身要求的功能——维护一个哈希表 2. 利用简易字典(具体数量看网上查找),检查错词,代码提示,错词用波浪线标出——实现算法和数据结构:字典树 3. 查询——实现编辑器中ctrl+F的功能。 4. 通过计算最小编辑距离实现模糊查询(可能上千词就不太顶得住) 5. 关于中文的两个计划: 1. 用库函数的正则表达式匹配萌混过关 2. 自己写一点简单正则匹配器 **测试部分** 1.测试用的文本Text.txt 2.测试选取的词汇: ​ a.richtextbox ​ b.vicious ​ c.resilient ​ d.emotionally ​ e.errors ​ f.destructive ​ g.oneself ​ h.allowing ​ i.mercy ​ o.Earth 3.测试的结果如下 ​ 1.sunday ![image-20210728163058031](images\image-20210728163058031.png) ![image-20210728163147199](images\image-20210728163147199.png) 2.KMP ![image-20210728163556224](images\image-20210728163556224.png) ![image-20210728163603259](images\image-20210728163603259.png) 3.BM ![image-20210728164041130](images\image-20210728164041130.png) ![image-20210728163957720](images\image-20210728163957720.png) 4.horspool ![image-20210729021241231](images\image-20210729021241231.png) ![image-20210729020732118](images\image-20210729020732118.png) 5.BF(暴力匹配) ![image-20210729021907759](images\image-20210729021907759.png) ![image-20210729021859867](images\image-20210729021859867.png) 6.indexof ![image-20210729022337887](images\image-20210729022337887.png) ![image-20210729022346516](images\image-20210729022346516.png)