# TextEditApp

**Repository Path**: zhxteam/text-edit-app

## Basic Information

- **Project Name**: TextEditApp
- **Description**: winform文本编辑器（算法实训大作业）
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-07-27
- **Last Updated**: 2022-07-04

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# TextEditApp

#### 介绍

2021暑期算法实训.NET winform文本编辑器

#### 算法

##### 1.字符串匹配

1. KMP

2. Boyer–Moore(好坏字符串)

3. Rabin-Karp（是暴力算法的改进，先对比哈希值再对比字符串，Go 1.1的indexof 源码就是使用RK算法，Go现在是1.16版,下面是我查的最新源码，仍然用了RK算法）

    ```go
    // Index returns the index of the first instance of substr in s, or -1 if substr is not present in s.
    func Index(s, substr string) int {
    	n := len(substr)
    	switch {
    	case n == 0:
    		return 0
    	case n == 1:
    		return IndexByte(s, substr[0])
    	case n == len(s):
    		if substr == s {
    			return 0
    		}
    		return -1
    	case n > len(s):
    		return -1
    	case n <= bytealg.MaxLen:
    		// Use brute force when s and substr both are small
    		if len(s) <= bytealg.MaxBruteForce {
    			return bytealg.IndexString(s, substr)
    		}
    		c0 := substr[0]
    		c1 := substr[1]
    		i := 0
    		t := len(s) - n + 1
    		fails := 0
    		for i < t {
    			if s[i] != c0 {
    				// IndexByte is faster than bytealg.IndexString, so use it as long as
    				// we're not getting lots of false positives.
    				o := IndexByte(s[i+1:t], c0)
    				if o < 0 {
    					return -1
    				}
    				i += o + 1
    			}
    			if s[i+1] == c1 && s[i:i+n] == substr {
    				return i
    			}
    			fails++
    			i++
    			// Switch to bytealg.IndexString when IndexByte produces too many false positives.
    			if fails > bytealg.Cutover(i) {
    				r := bytealg.IndexString(s[i:], substr)
    				if r >= 0 {
    					return r + i
    				}
    				return -1
    			}
    		}
    		return -1
    	}
    	c0 := substr[0]
    	c1 := substr[1]
    	i := 0
    	t := len(s) - n + 1
    	fails := 0
    	for i < t {
    		if s[i] != c0 {
    			o := IndexByte(s[i+1:t], c0)
    			if o < 0 {
    				return -1
    			}
    			i += o + 1
    		}
    		if s[i+1] == c1 && s[i:i+n] == substr {
    			return i
    		}
    		i++
    		fails++
    		if fails >= 4+i>>4 && i < t {
    			// See comment in ../bytes/bytes.go.
    			j := bytealg.IndexRabinKarp(s[i:], substr)
    			if j < 0 {
    				return -1
    			}
    			return i + j
    		}
    	}
    	return -1
    }
    
    // IndexRabinKarp uses the Rabin-Karp search algorithm to return the index of the
    // first occurrence of substr in s, or -1 if not present.
    func IndexRabinKarp(s, substr string) int {
    	// Rabin-Karp search
    	hashss, pow := HashStr(substr)
    	n := len(substr)
    	var h uint32
    	for i := 0; i < n; i++ {
    		h = h*PrimeRK + uint32(s[i])
    	}
    	if h == hashss && s[:n] == substr {
    		return 0
    	}
    	for i := n; i < len(s); {
    		h *= PrimeRK
    		h += uint32(s[i])
    		h -= pow * uint32(s[i-n])
    		i++
    		if h == hashss && s[i-n:i] == substr {
    			return i - n
    		}
    	}
    	return -1
    }
    ```

    
4. Sunday（作者叫sunday哈哈，类似BM算法）

5. AC自动机（1975年来自贝尔实验室）

6. two-way算法（glibc库strstr的算法，空间复杂度O（1））

7. horspool（Horspool 字符串匹配算法对Boyer-Moore算法的简化算法。

    Horspool 算法是一种基于后缀匹配的方法，是一种“跳跃式”匹配算法，具有**sub-linear亚线性时间复杂度**。）

#### 开发说明

拖下仓库以后，双击.sln文件

#### 接口

> 有更改可能

##### 1. 字典查询

```c#
using TextEditorI.Algorithm; //导入类
...
SimpleDictionary d = SimpleDictionary.GetDictionary; //使用范例，注意这个不是函数，是反射功能
Console.WriteLine(d.isEnglish("Apple"));//True
Console.WriteLine(d.isEnglish("Banana"));//True
Console.WriteLine(d.isEnglish("??"));//False
```

##### 2.统计词频

```c#
using TextEditorI.Algorithm; //导入类
...
    
var map = WordFrequency.GetWordFrequency(text,d);//其中d是字典，也就是说你需要给他一本词典才能进行统计，具体获取方法看API1，text是string类型
```

##### 3.文档分割

```c#
using TextEditorI.Algorithm; //导入类
...

string text;
string[] words = Util.GetWordArray(text);
```

#### 性能

字典约45000词

文本约23000词

![img](images/clip_image002.jpg)

本机i7 8550u 8g 2400hz

仅字典占用

![image-20210727165813265](images/image-20210727165813265.png)

字典加文本哈希的内存

![image-20210727165648227](images/image-20210727165648227.png)

原始内存占用

![image-20210727165616733](images/image-20210727165616733.png)

#### 实现计划

1. 统计词频等题目本身要求的功能——维护一个哈希表
2. 利用简易字典（具体数量看网上查找），检查错词，代码提示，错词用波浪线标出——实现算法和数据结构：字典树
3. 查询——实现编辑器中<kbd>ctrl</kbd>+<kbd>F</kbd>的功能。
4. 通过计算最小编辑距离实现模糊查询（可能上千词就不太顶得住）
5. 关于中文的两个计划：
    1. 用库函数的正则表达式匹配萌混过关
    2. 自己写一点简单正则匹配器

**测试部分**

1.测试用的文本Text.txt

2.测试选取的词汇：

​	a.richtextbox

​	b.vicious

​	c.resilient

​	d.emotionally

​	e.errors

​	f.destructive

​	g.oneself

​	h.allowing

​	i.mercy

​	o.Earth

3.测试的结果如下

​	1.sunday

![image-20210728163058031](images\image-20210728163058031.png)

![image-20210728163147199](images\image-20210728163147199.png)

2.KMP

![image-20210728163556224](images\image-20210728163556224.png)

![image-20210728163603259](images\image-20210728163603259.png)

3.BM

![image-20210728164041130](images\image-20210728164041130.png)

![image-20210728163957720](images\image-20210728163957720.png)

4.horspool

![image-20210729021241231](images\image-20210729021241231.png)

![image-20210729020732118](images\image-20210729020732118.png)

5.BF（暴力匹配）

![image-20210729021907759](images\image-20210729021907759.png)

![image-20210729021859867](images\image-20210729021859867.png)


6.indexof

![image-20210729022337887](images\image-20210729022337887.png)

![image-20210729022346516](images\image-20210729022346516.png)