# CT-CHAT
**Repository Path**: xana/CT-CHAT
## Basic Information
- **Project Name**: CT-CHAT
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-16
- **Last Updated**: 2025-12-16
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# CT-CHAT
Welcome to the official repository for **CT-CHAT**, a cutting-edge visual-language chat model designed specifically for 3D chest CT volumes. CT-CHAT provides an open-source codebase and pre-trained models, utilizing [CT-CLIP](https://github.com/ibrahimethemhamamci/CT-CLIP) and a VQA (Visual Question Answering) dataset adapted from [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE), making it accessible to researchers worldwide. The VQA dataset and model weights are available via the [HuggingFace repository](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE).
## System Requirements
Before you get started, ensure that your environment meets the following requirements:
- **Python version**: > 3.12.4
- **Necessary dependencies**: Install CT-CLIP’s dependencies by following the instructions in the [CT-CLIP repository](https://github.com/ibrahimethemhamamci/CT-CLIP).
- **Additional libraries**: Ensure that the following libraries are installed:
- PyTorch v2.4.0
- CUDA v12.4
- SciPy v1.14.0
- Torchvision v0.19.0
- Scikit-learn v1.2.2
- Pandas v2.2.2
- Transformers v4.44.0
- NumPy v1.26.4
### Hardware Requirements
- **For training**:
- Small models: Minimum of 2 A100 GPUs with 80GB VRAM.
- Large models (80B Llama 3.1): Minimum of 4 A100 GPUs.
- **For inference**:
- Large models: At least 2 A100 GPUs.
- Smaller models: 1 A100 GPU.
## Training
To train the model, follow the provided scripts. It's crucial to run the training data through the image encoder to generate embeddings prior to training. Use the provided [Encoder Script](https://github.com/ibrahimethemhamamci/CT-CHAT/blob/main/llava/serve/encode_script.py) as a reference for encoding a single image. Note that this differs from the latent-saving process in CT-CLIP; the outputs must be saved before latent projection. Update the training scripts with the correct path to the saved encodings and other necessary configurations.
## Inference
For inference, refer to the [serve scripts](llava/serve). To perform CLI-based inference, the validation data must first be encoded similarly to the training data. After encoding, adjust the required paths in the CT-CHAT validation scripts for CLI inference. After calculating latent embeddings, inference with 4 A100 GPUs is expected to be 5-10 tokens/s for Llama 70B, for Llama 8B model, it is expected to be 10-20 tokens/s in 2 A100 GPUs.
For GUI-based inference, run the following commands:
```bash
python -m llava.serve.controller --host 0.0.0.0 --port 10000
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path "path_to_model" --model-base "path_to_model"
```
## Pretrained Models
We offer pre-trained models for several LLMs, trained on the VQA dataset described in our paper. You can download them from the links below:
- **CT-CHAT (Llama 3.1 70B)**: [Download Here](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE)
- **CT-CHAT (Llama 3.1 8B)**: [Download Here](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE)
- **CT-CHAT (Vicuna)**: [Download Here](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE)
- **CT-CHAT (Mistral)**: [Download Here](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE)
## VQA Dataset
The VQA dataset has been derived from the CT-RATE data using the Llama 3.1 80B model with the scripts provided [here](./VQA_dataset). Short-answer questions have been sampled from the [RadGenome Chest CT dataset](https://huggingface.co/datasets/RadGenome/RadGenome-ChestCT). The dataset is available in the [CT-RATE HuggingFace repository](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE).
## Citing Us
If you use CT-CHAT, CT-CLIP, or our CT-RATE dataset in your research, please cite [our paper](https://arxiv.org/abs/2403.17834).
## License
We are committed to fostering innovation and collaboration in the research community. To this end, all elements of CT-RATE, CT-CLIP, and CT-CHAT are released under a [Creative Commons Attribution (CC-BY-NC-SA) license](https://creativecommons.org/licenses/by-nc-sa/4.0/). This licensing framework ensures that our contributions can be freely used for non-commercial research purposes, while also encouraging contributions and modifications, provided that the original work is properly cited and any derivative works are shared under similar terms.
## Acknowledgements
We would like to express our sincere gratitude to the following works, whose contributions were invaluable to our research. Our VQA dataset includes a subset of data from [RadGenome Chest CT](https://arxiv.org/abs/2404.16754). Additionally, our CT-CHAT model is a 3D adaptation of the [LLaVA](https://arxiv.org/pdf/2304.08485) model for CT volumes. CT-CHAT leverages CT-ViT architecture as the vision encoder which is introduced as part of [GenerateCT](https://arxiv.org/abs/2305.16037). We are deeply appreciative of these researchers for their outstanding open-source contributions. If you use our VQA data or CT-CHAT model in your work, we kindly ask that you also cite the related works to acknowledge their impact.