# SadTalker **Repository Path**: dingjianfei/SadTalker ## Basic Information - **Project Name**: SadTalker - **Description**: 镜像 - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 1 - **Created**: 2023-04-03 - **Last Updated**: 2023-10-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker)

Wenxuan Zhang ^*,1,2 Xiaodong Cun ^*,2 Xuan Wang ³ Yong Zhang ² Xi Shen ²
Yu Guo¹ Ying Shan ² Fei Wang ¹

¹ Xi'an Jiaotong University ² Tencent AI Lab ³ Ant Group

CVPR 2023

![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif) TL;DR: single portrait image 🙎‍♂️ + audio 🎤 = talking head video 🎞.

## 🔥 Highlight - 🔥 Beta version of the `full image mode` is online! checkout [here](https://github.com/Winfredy/SadTalker#beta-full-bodyimage-generation) for more details. | still | still + enhancer | [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) | |:--------------------: |:--------------------: | :----: | |

- 🔥 Several new mode, eg, `still mode`, `reference mode`, `resize mode` are online for better and custom applications. - 🔥 Happy to see our method is used in various talking or singing avatar, checkout these wonderful demos at [bilibili](https://search.bilibili.com/all?keyword=sadtalker&from_source=webtop_search&spm_id_from=333.1007&search_source=3 ) and [twitter #sadtalker](https://twitter.com/search?q=%23sadtalker&src=typed_query). ## 📋 Changelog (Previous changelog can be founded [here](docs/changlelog.md)) - __[2023.03.30]__: Launch beta version of the full body mode. - __[2023.03.30]__: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement. - __[2023.03.29]__: `resize mode` is online by `python infererence.py --preprocess resize`! Where we can produce a larger crop of the image as discussed in https://github.com/Winfredy/SadTalker/issues/35. - __[2023.03.29]__: local gradio demo is online! `python app.py` to start the demo. New `requirments.txt` is used to avoid the bugs in `librosa`. - __[2023.03.28]__: Online demo is launched in [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker), thanks AK! ## 🎼 Pipeline ![main_of_sadtalker](https://user-images.githubusercontent.com/4397546/222490596-4c8a2115-49a7-42ad-a2c3-3bb3288a5f36.png) > Our method uses the coefficients of 3DMM as intermediate motion representation. To this end, we first generate realistic 3D motion coefficients (facial expression β, head pose ρ) from audio, then these coefficients are used to implicitly modulate the 3D-aware face render for final video generation. ## 🚧 TODO

Previous TODOs

- [x] Generating 2D face from a single Image. - [x] Generating 3D face from Audio. - [x] Generating 4D free-view talking examples from audio and a single image. - [x] Gradio/Colab Demo. - [x] Full body/image Generation.

- [ ] training code of each componments. - [ ] Audio-driven Anime Avatar. - [ ] interpolate ChatGPT for a conversation demo 🤔 - [ ] integrade with stable-diffusion-web-ui. (stay tunning!) https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4 ## ⚙️ Installation #### Dependence Installation

CLICK ME For Mannual Installation

```bash git clone https://github.com/Winfredy/SadTalker.git cd SadTalker conda create -n sadtalker python=3.8 conda activate sadtalker pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 conda install ffmpeg pip install -r requirements.txt ```

CLICK For Docker Installation

A dockerfile are also provided by [@thegenerativegeneration](https://github.com/thegenerativegeneration) in [docker hub](https://hub.docker.com/repository/docker/wawa9000/sadtalker), which can be used directly as: ```bash docker run --gpus "all" --rm -v $(pwd):/host_dir wawa9000/sadtalker \ --driven_audio /host_dir/deyu.wav \ --source_image /host_dir/image.jpg \ --expression_scale 1.0 \ --still \ --result_dir /host_dir ```

#### Download Trained Models

CLICK ME

You can run the following script to put all the models in the right place. ```bash bash scripts/download_models.sh ``` OR download our pre-trained model from [google drive](https://drive.google.com/drive/folders/1Wd88VDoLhVzYsQ30_qDVluQr_Xm46yHT?usp=sharing) or our [github release page](https://github.com/Winfredy/SadTalker/releases/tag/v0.0.1), and then, put it in ./checkpoints. | Model | Description | :--- | :---------- |checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis). |checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction). |checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip). |checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/). |checkpoints/BFM | 3DMM library file. |checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).

## 🔮 Quick Start #### Generating 2D face from a single Image from default config. ```bash python inference.py --driven_audio --source_image ``` The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`. Or a local gradio demo can be run by: ```bash python app.py ``` #### Advanced Configuration

Click Me

#### Examples | basic | w/ still mode | w/ exp_scale 1.3 | w/ gfpgan | |:-------------: |:-------------: |:-------------: |:-------------: | |