# math-formula-recognition

**Repository Path**: deplay/math-formula-recognition

## Basic Information

- **Project Name**: math-formula-recognition
- **Description**: Math formula recognition (Images to LaTeX strings)
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 2
- **Created**: 2020-01-16
- **Last Updated**: 2023-06-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Off-Line Math Formula Recognition Using Deep Neural Networks

Based on [Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition][arxiv-zhang18].

## Requirements

- Python 3
- [PyTorch][pytorch]

All dependencies can be installed with PIP.

```sh
pip install -r requirements.txt
```

If you'd like to use a different installation method or another CUDA version
with PyTorch (e.g. CUDA 10) follow the instructions on
[PyTorch - Getting Started][pytorch-started].

## Data

[CROHME: Competition on Recognition of Online Handwritten Mathematical
Expressions][crohme] has been used. As it is an on-line handwritten dataset, it
consists of InkML files, but this architecture is for off-line recognition,
which means that images are used as input.

The dataset has been converted to images of size `256x256` and the ground truth
has been extracted as well. The converted dataset can be found at
[Floydhub - crohme-png][crohme-png].

The data needs to be in the `data/` directory and a `tokens.tsv` file defines
the available tokens separated by tabs. Training and validation sets are defined
in `gt_split/train.tsv` and `gt_split/validation.tsv`, where each line is the
path to the image and its ground truth.

The training/validation split can be generated by running:

```sh
python data_tools/train_validation_split.py -i data/groundtruth_train.tsv -o data/gt_split
```

*Note: The content of the generated images vary greatly in size. As longer
expressions are limited to the same width, they will essentially use a smaller
font. This makes it much more difficult to correctly predict the sequences,
especially since the dataset is quite small. The primary focus was the
attention mechanism, to see whether it can handle different sizes. If you want
better results, the images need to be normalised.*

## Usage

### Training

Training is done with the `train.py` script:

```sh
python train.py --prefix "some-name-" -n 200 -c checkpoints/example-0022.pth
```

The `--prefix` option is used to give it a name, otherwise the checkpoints are
just numbered without any given name and `-c` is to resume from the given
checkpoint, if not specified it starts fresh.

For all options see `python train.py --help`:


### Evaluation

To evaluate a model use the `evaluate.py` script with the desired checkpoint and
the dataset it should be tested against (can use multiple sets at once):

For example to evaluate the sets 2014 and 2016 with beam width 5:

```sh
python evaluate.py -d 2014 2016 --beam-width 5 -c checkpoints/example-0022.pth
```

[arxiv-zhang18]: https://arxiv.org/pdf/1801.03530.pdf
[crohme]: https://www.isical.ac.in/~crohme/
[crohme-png]: https://www.floydhub.com/jungomi/datasets/crohme-png
[pytorch]: https://pytorch.org/
[pytorch-started]: https://pytorch.org/get-started/locally/