# SadTalker **Repository Path**: hola/sad-talker ## Basic Information - **Project Name**: SadTalker - **Description**: 国外SadTalker - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2024-08-23 - **Last Updated**: 2024-08-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker)

Wenxuan Zhang ^*,1,2 Xiaodong Cun ^*,2 Xuan Wang ³ Yong Zhang ² Xi Shen ²
Yu Guo¹ Ying Shan ² Fei Wang ¹

¹ Xi'an Jiaotong University ² Tencent AI Lab ³ Ant Group

CVPR 2023

![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif) TL;DR: single portrait image 🙎‍♂️ + audio 🎤 = talking head video 🎞.

## 🔥🔥🔥 Highlight - Several new mode, eg, `still mode`, `reference mode`, `resize mode` are online for better and custom applications. - Happy to see our method is used in various talking or singing avatar, checkout these wonderful demos at [bilibili](https://search.bilibili.com/all?keyword=sadtalker&from_source=webtop_search&spm_id_from=333.1007&search_source=3 ) and [twitter #sadtalker](https://twitter.com/search?q=%23sadtalker&src=typed_query). ## 📋 Changelog - __[2023.03.30]__: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement. - __[2023.03.29]__: `resize mode` is online by `python infererence.py --preprocess resize`! Where we can produce a larger crop of the image as discussed in https://github.com/Winfredy/SadTalker/issues/35. - __[2023.03.29]__: local gradio demo is online! `python app.py` to start the demo. New `requirments.txt` is used to avoid the bugs in `librosa`. - __[2023.03.28]__: Online demo is launched in [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker), thanks AK! - __[2023.03.22]__: Launch new feature: generating the 3d face animation from a single image. New applications about it will be updated. - __[2023.03.22]__: Launch new feature: `still mode`, where only a small head pose will be produced via `python inference.py --still`.

Previous Changelogs

- __[2023.03.18]__: Support `expression intensity`, now you can change the intensity of the generated motion: `python inference.py --expression_scale 1.3 (some value > 1)`. - __[2023.03.18]__: Reconfig the data folders, now you can download the checkpoint automatically using `bash scripts/download_models.sh`. - __[2023.03.18]__: We have offically integrate the [GFPGAN](https://github.com/TencentARC/GFPGAN) for face enhancement, using `python inference.py --enhancer gfpgan` for better visualization performance. - __[2023.03.14]__: Specify the version of package `joblib` to remove the errors in using `librosa`, [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) is online! - __[2023.03.06]__: Solve some bugs in code and errors in installation - __[2023.03.03]__: Release the test code for audio-driven single image animation! - __[2023.02.28]__: SadTalker has been accepted by CVPR 2023!

## 🎼 Pipeline ![main_of_sadtalker](https://user-images.githubusercontent.com/4397546/222490596-4c8a2115-49a7-42ad-a2c3-3bb3288a5f36.png) ## 🚧 TODO - [x] Generating 2D face from a single Image. - [x] Generating 3D face from Audio. - [x] Generating 4D free-view talking examples from audio and a single image. - [x] Gradio/Colab Demo. - [ ] Full body/image Generation. - [ ] training code of each componments. - [ ] Audio-driven Anime Avatar. - [ ] interpolate ChatGPT for a conversation demo 🤔 - [ ] integrade with stable-diffusion-web-ui. (stay tunning!) https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4 ## 🔮 Installation #### Dependence Installation

CLICK ME For Mannual Installation

``` git clone https://github.com/Winfredy/SadTalker.git cd SadTalker conda create -n sadtalker python=3.8 source activate sadtalker pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 conda install ffmpeg pip install dlib-bin # [dlib-bin is much faster than dlib installation] conda install dlib pip install -r requirements.txt ### install gpfgan for enhancer pip install git+https://github.com/TencentARC/GFPGAN ```

CLICK For Docker Installation

A dockerfile are also provided by [@thegenerativegeneration](https://github.com/thegenerativegeneration) in [docker hub](https://hub.docker.com/repository/docker/wawa9000/sadtalker), which can be used directly as: ```bash docker run --gpus "all" --rm -v $(pwd):/host_dir wawa9000/sadtalker \ --driven_audio /host_dir/deyu.wav \ --source_image /host_dir/image.jpg \ --expression_scale 1.0 \ --still \ --result_dir /host_dir ```

#### Trained Models

CLICK ME

You can run the following script to put all the models in the right place. ```bash bash scripts/download_models.sh ``` OR download our pre-trained model from [google drive](https://drive.google.com/drive/folders/1Wd88VDoLhVzYsQ30_qDVluQr_Xm46yHT?usp=sharing) or our [github release page](https://github.com/Winfredy/SadTalker/releases/tag/v0.0.1), and then, put it in ./checkpoints. | Model | Description | :--- | :---------- |checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis). |checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction). |checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip). |checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/). |checkpoints/BFM | 3DMM library file. |checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).

## 🔮 Inference Demo #### Generating 2D face from a single Image ```bash python inference.py --driven_audio \ --source_image \ --batch_size \ --expression_scale \ --result_dir \ --still \ --preprocess \ --enhancer \ --ref_video ``` | basic | w/ still mode | w/ exp_scale 1.3 | w/ gfpgan | |:-------------: |:-------------: |:-------------: |:-------------: | |