# Fast-ImageNet-Dataloader

**Repository Path**: gaomengfan/Fast-ImageNet-Dataloader

## Basic Information

- **Project Name**: Fast-ImageNet-Dataloader
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-01-16
- **Last Updated**: 2024-01-16

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

Install
-------
Requirements:

* [Tensorpack][]: clone and `pip install -e .`
* [LMDB][]: `pip install lmdb`
* [OpenCV][]: `pip install opencv-python`
* [Protobuf][]: `conda install protobuf`
* [Prctl][]: clone, `sudo apt-get install build-essential libcap-dev` and `python setup.py build`


[tensorpack]: https://github.com/ppwwyyxx/tensorpack
[lmdb]: https://lmdb.readthedocs.io/en/release/
[opencv]: https://pypi.python.org/pypi/opencv-python
[Protobuf]: https://github.com/google/protobuf
[Prctl]: https://github.com/seveas/python-prctl


`Tensorpack` version > 0.9 is currently NOT supported.
Note that some prebuilt `opencv` is much slower than others. 
Remember to check with [this script](https://github.com/tensorpack/benchmarks/blob/master/ImageNet/benchmark-opencv-resize.py) and make sure it prints < 1s.

### Preprocessing

To start, set the environment variable `IMAGENET` to the ILSVRC2012 
dataset. `TENSORPACK_DATASET` should also be set (for tensorpack).

```script
export IMAGENET='/mnt/work/data/raw-data/'
python preprocess_sequential.py
```

### Usage

```
train_loader = LMDBLoader('train', batch_size=args.batch_size, num_workers=32, shuffle=True, cuda=True)
valid_loader = LMDBLoader('val', batch_size=args.batch_size, num_workers=32, shuffle=False, cuda=True) 
```
## TODO 
- [ ] Image Normalization
- [ ] Support HDF5 format
- [ ] Tensorpack version > 0.9

### Disclaimer

Code mainly from [sequential-imagenet-dataloader](https://github.com/BayesWatch/sequential-imagenet-dataloader), and [Tensorpack](https://github.com/tensorpack/tensorpack) examples.

### Reference

* [Data loader takes a lot of time for every nth iteration](https://discuss.pytorch.org/t/data-loader-takes-a-lot-of-time-for-every-nth-iteration/10831)
* [First batch of Imagenet training is slow with sequential loading](https://discuss.pytorch.org/t/first-batch-of-imagenet-training-is-slow-with-sequential-loading/11464)
* [How to prefetch data when processing with GPU?](https://discuss.pytorch.org/t/how-to-prefetch-data-when-processing-with-gpu/548)
* [How to speed up the data loader](https://discuss.pytorch.org/t/how-to-speed-up-the-data-loader/13740)
[Fast data loader for Imagenet](https://discuss.pytorch.org/t/fast-data-loader-for-imagenet/988/14)