# Open-GroundingDino **Repository Path**: WeGoHard/Open-GroundingDino ## Basic Information - **Project Name**: Open-GroundingDino - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: BIGBALLON-patch-1 - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-11-07 - **Last Updated**: 2024-11-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
This is the third party implementation of the paper **[Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://arxiv.org/abs/2303.05499)** by [Zuwei Long]() and [Wei Li](https://github.com/bigballon). **You can use this code to fine-tune a model on your own dataset, or start pretraining a model from scratch.** - [Supported Features](#supported-features) - [Setup](#setup) - [Dataset](#dataset) - [Config](#config) - [Training](#training) - [Results and Models](#results-and-models) - [Inference](#inference) - [Acknowledgments](#acknowledgments) - [Citation](#citation) - [Contact](#contact) # Supported Features | | Official release version | The version we replicated | | ------------------------------ | :----------------------: | :-----------------------: | | Inference | ✔ | ✔ | | Train (Objecet Detection data) | ✖ | ✔ | | Train (Grounding data) | ✖ | ✔ | | Slurm multi-machine support | ✖ | ✔ | | Training acceleration strategy | ✖ | ✔ | # Setup We conduct our model testing using the following versions: Python 3.7.11, PyTorch 1.11.0, and CUDA 11.3. It is possible that other versions are also available. 1. Clone this repository. ```bash git clone https://github.com/longzw1997/Open-GroundingDino.git && cd Open-GroundingDino/ ``` 2. Install the required dependencies. ```bash pip install -r requirements.txt cd models/GroundingDINO/ops python setup.py build install python test.py cd ../../.. ``` 3. Download [pre-trained model](https://github.com/IDEA-Research/GroundingDINO/releases) and [BERT](https://huggingface.co/bert-base-uncased) weights, then modify the corresponding paths in the train/test script. # Dataset For **training**, we use the [odvg data format](data_format.md) to support **both OD data and VG data**. Before model training begins, you need to convert your dataset into odvg format, see [data_format.md](data_format.md) | [datasets_mixed_odvg.json](config/datasets_mixed_odvg.json) | [coco2odvg.py](./tools/coco2odvg.py) | [grit2odvg](./tools/grit2odvg.py) for more details. For **testing**, we use [coco format](https://cocodataset.org/#format-data), which currently only supports OD datasets.
mixed dataset
``` json { "train": [ { "root": "path/V3Det/", "anno": "path/V3Det/annotations/v3det_2023_v1_all_odvg.jsonl", "label_map": "path/V3Det/annotations/v3det_label_map.json", "dataset_mode": "odvg" }, { "root": "path/LVIS/train2017/", "anno": "path/LVIS/annotations/lvis_v1_train_odvg.jsonl", "label_map": "path/LVIS/annotations/lvis_v1_train_label_map.json", "dataset_mode": "odvg" }, { "root": "path/Objects365/train/", "anno": "path/Objects365/objects365_train_odvg.json", "label_map": "path/Objects365/objects365_label_map.json", "dataset_mode": "odvg" }, { "root": "path/coco_2017/train2017/", "anno": "path/coco_2017/annotations/coco2017_train_odvg.jsonl", "label_map": "path/coco_2017/annotations/coco2017_label_map.json", "dataset_mode": "odvg" }, { "root": "path/GRIT-20M/data/", "anno": "path/GRIT-20M/anno/grit_odvg_620k.jsonl", "dataset_mode": "odvg" }, { "root": "path/flickr30k/images/flickr30k_images/", "anno": "path/flickr30k/annotations/flickr30k_entities_odvg_158k.jsonl", "dataset_mode": "odvg" } ], "val": [ { "root": "path/coco_2017/val2017", "anno": "config/instances_val2017.json", "label_map": null, "dataset_mode": "coco" } ] } ```
example for odvg dataset
``` bash # For OD {"filename": "000000391895.jpg", "height": 360, "width": 640, "detection": {"instances": [{"bbox": [359.17, 146.17, 471.62, 359.74], "label": 3, "category": "motorcycle"}, {"bbox": [339.88, 22.16, 493.76, 322.89], "label": 0, "category": "person"}, {"bbox": [471.64, 172.82, 507.56, 220.92], "label": 0, "category": "person"}, {"bbox": [486.01, 183.31, 516.64, 218.29], "label": 1, "category": "bicycle"}]}} {"filename": "000000522418.jpg", "height": 480, "width": 640, "detection": {"instances": [{"bbox": [382.48, 0.0, 639.28, 474.31], "label": 0, "category": "person"}, {"bbox": [234.06, 406.61, 454.0, 449.28], "label": 43, "category": "knife"}, {"bbox": [0.0, 316.04, 406.65, 473.53], "label": 55, "category": "cake"}, {"bbox": [305.45, 172.05, 362.81, 249.35], "label": 71, "category": "sink"}]}} # For VG {"filename": "014127544.jpg", "height": 400, "width": 600, "grounding": {"caption": "Homemade Raw Organic Cream Cheese for less than half the price of store bought! It's super easy and only takes 2 ingredients!", "regions": [{"bbox": [5.98, 2.91, 599.5, 396.55], "phrase": "Homemade Raw Organic Cream Cheese"}]}} {"filename": "012378809.jpg", "height": 252, "width": 450, "grounding": {"caption": "naive : Heart graphics in a notebook background", "regions": [{"bbox": [93.8, 47.59, 126.19, 77.01], "phrase": "Heart graphics"}, {"bbox": [2.49, 1.44, 448.74, 251.1], "phrase": "a notebook background"}]}} ```
# Config ``` config/cfg_odvg.py # for backbone, batch size, LR, freeze layers, etc. config/datasets_mixed_odvg.json # support mixed dataset for both OD and VG ``` # Training - **Datasets:** before starting the training, you need to modify the ``config/datasets_mixed_example.json`` according to [data_format.md](data_format.md). - **Configs:** defaults to using coco_val2017 for evaluation. - If you are evaluating with your own test set, you need to convert the test data to coco format (not the ovdg format) and modify the config to set **use_coco_eval = False** (The COCO dataset has 80 classes used for training but 90 categories in total, so there is a built-in mapping in the code). - Also, add(or update) the **label_list** in the config with your own class names like **label_list=['dog', 'cat', 'person']**. ``` diff - use_coco_eval = True + use_coco_eval = False + label_list=['dog', 'cat', 'person'] ``` - **Train/Eval**: ``` bash # train/eval on torch.distributed.launch: bash train_dist.sh ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR} bash test_dist.sh ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR} # train/eval on slurm cluster: bash train_slurm.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR} bash test_slurm.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR} # e.g. check train_slurm.sh for more details # bash train_slurm.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs # bash train_slurm.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs ``` # Results and Models
Name Pretrain data Task mAP on COCO Ckpt Misc
GroundingDINO-T
(offical)
O365,GoldG,Cap4M zero-shot 48.4
(zero-shot)
model -
GroundingDINO-T
(fine-tune)
O365,GoldG,Cap4M finetune
w/ coco
57.3
(fine-tune)
model cfg | log
GroundingDINO-T
(pretrain)
COCO,O365,LIVS,V3Det,
GRIT-200K,Flickr30k(total 1.8M)
zero-shot 55.1
(zero-shot)
model cfg | log
- [GRIT](https://huggingface.co/datasets/zzliang/GRIT)-200K generated by [GLIP](https://github.com/microsoft/GLIP) and [spaCy](https://spacy.io/). # Inference Because the model architecture has not changed, you only need to **install** [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) library and then run [inference_on_a_image.py](./tools/inference_on_a_image.py) to inference your images. ``` bash python tools/inference_on_a_image.py \ -c tools/GroundingDINO_SwinT_OGC.py \ -p path/to/your/ckpt.pth \ -i ./figs/dog.jpeg \ -t "dog" \ -o output ``` | Prompt | Official ckpt | COCO ckpt | 1.8M ckpt | | :----: | :--------------------------: | :----------------------: | :----------------------: | | dog | ![](./figs/dog-official.jpg) | ![](./figs/dog-coco.jpg) | ![](./figs/dog-1.8m.jpg) | | cat | ![](./figs/cat-official.jpg) | ![](./figs/cat-coco.jpg) | ![](./figs/cat-1.8m.jpg) | # Acknowledgments Provided codes were adapted from: - [microsoft/GLIP](https://github.com/microsoft/GLIP) - [IDEA-Research/DINO](https://github.com/IDEA-Research/DINO/) - [IDEA-Research/GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) # Citation ``` @misc{Open Grounding Dino, author = {Zuwei Long, Wei Li}, title = {Open Grounding Dino:The third party implementation of the paper Grounding DINO}, howpublished = {\url{https://github.com/longzw1997/Open-GroundingDino}}, year = {2023} } ``` # Contact - longzuwei at sensetime.com - liwei1 at sensetime.com Feel free to contact we if you have any suggestions or questions. Bugs found are also welcome. Please create a pull request if you find any bugs or want to contribute code.