# Fast-ImageNet-Dataloader **Repository Path**: gaomengfan/Fast-ImageNet-Dataloader ## Basic Information - **Project Name**: Fast-ImageNet-Dataloader - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-16 - **Last Updated**: 2024-01-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Install ------- Requirements: * [Tensorpack][]: clone and `pip install -e .` * [LMDB][]: `pip install lmdb` * [OpenCV][]: `pip install opencv-python` * [Protobuf][]: `conda install protobuf` * [Prctl][]: clone, `sudo apt-get install build-essential libcap-dev` and `python setup.py build` [tensorpack]: https://github.com/ppwwyyxx/tensorpack [lmdb]: https://lmdb.readthedocs.io/en/release/ [opencv]: https://pypi.python.org/pypi/opencv-python [Protobuf]: https://github.com/google/protobuf [Prctl]: https://github.com/seveas/python-prctl `Tensorpack` version > 0.9 is currently NOT supported. Note that some prebuilt `opencv` is much slower than others. Remember to check with [this script](https://github.com/tensorpack/benchmarks/blob/master/ImageNet/benchmark-opencv-resize.py) and make sure it prints < 1s. ### Preprocessing To start, set the environment variable `IMAGENET` to the ILSVRC2012 dataset. `TENSORPACK_DATASET` should also be set (for tensorpack). ```script export IMAGENET='/mnt/work/data/raw-data/' python preprocess_sequential.py ``` ### Usage ``` train_loader = LMDBLoader('train', batch_size=args.batch_size, num_workers=32, shuffle=True, cuda=True) valid_loader = LMDBLoader('val', batch_size=args.batch_size, num_workers=32, shuffle=False, cuda=True) ``` ## TODO - [ ] Image Normalization - [ ] Support HDF5 format - [ ] Tensorpack version > 0.9 ### Disclaimer Code mainly from [sequential-imagenet-dataloader](https://github.com/BayesWatch/sequential-imagenet-dataloader), and [Tensorpack](https://github.com/tensorpack/tensorpack) examples. ### Reference * [Data loader takes a lot of time for every nth iteration](https://discuss.pytorch.org/t/data-loader-takes-a-lot-of-time-for-every-nth-iteration/10831) * [First batch of Imagenet training is slow with sequential loading](https://discuss.pytorch.org/t/first-batch-of-imagenet-training-is-slow-with-sequential-loading/11464) * [How to prefetch data when processing with GPU?](https://discuss.pytorch.org/t/how-to-prefetch-data-when-processing-with-gpu/548) * [How to speed up the data loader](https://discuss.pytorch.org/t/how-to-speed-up-the-data-loader/13740) [Fast data loader for Imagenet](https://discuss.pytorch.org/t/fast-data-loader-for-imagenet/988/14)