# HeyGenClone **Repository Path**: blueshh/HeyGenClone ## Basic Information - **Project Name**: HeyGenClone - **Description**: myHeygen依赖 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-02 - **Last Updated**: 2024-01-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
Welcome to the HeyGenClone, an open-source analogue of the HeyGen system.
I am a developer from Moscow 🇷🇺 who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!
Currently, translation support is enabled only from English 🇬🇧!
## Installation 🥸
- Clone this repo
- Install [conda](https://conda.io/projects/conda/en/latest/user-guide/install/)
- Create environment with Python 3.10 (for macOS refer to [link](https://www.mrdbourke.com/setup-apple-m1-pro-and-m1-max-for-machine-learning-and-data-science/))
- Activate environment
- Install requirements:
```
cd path_to_project
sh install.sh
```
- In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit [speaker-diarization](https://hf.co/pyannote/speaker-diarization), [segmentation](https://hf.co/pyannote/segmentation) and accept user conditions
- Download weights from [drive](https://drive.google.com/file/d/1dYy24q_67TmVuv_PbChe2t1zpNYJci1J/view?usp=sharing), unzip downloaded file into weights folder
- Install [ffmpeg](https://ffmpeg.org/)
## Configurations (config.json) 🧙♂️
| Key | Description |
| :---: | :---: |
| DET_TRESH | Face detection treshtold [0.0:1.0] |
| DIST_TRESH | Face embeddings distance treshtold [0.0:1.0] |
| HF_TOKEN | Your HuggingFace token (see [Installation](https://github.com/BrasD99/HeyGenClone/tree/main#installation)) |
| USE_ENHANCER | Do we need to improve faces using GFPGAN? |
| ADD_SUBTITLES | Subtitles in the output video |
## Supported languages 🙂
English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko)
## Usage 🤩
- Activate your environment:
```
conda activate your_env_name
```
- Сd to project path:
```
cd path_to_project
```
At the root of the project there is a translate script that translates the video you set.
- video_filename - the filename of your input video (.mp4)
- output_language - the language to be translated into. Provided [here](https://github.com/BrasD99/HeyGenClone#supported-languages-) (you can also find it in my [code](https://github.com/BrasD99/HeyGenClone/blob/main/core/mapper.py))
- output_filename - the filename of output video (.mp4)
```
python translate.py video_filename output_language -o output_filename
```
I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Сurrently it works for videos with one person.
- voice_filename - the filename of your speech (.wav)
- video_filename - the filename of your input video (.mp4)
- output_filename - the filename of output video (.mp4)
```
python speech_changer.py voice_filename video_filename -o output_filename
```
## How it works 😱
1. Detecting scenes ([PySceneDetect](https://github.com/Breakthrough/PySceneDetect))
2. Face detection ([yolov8-face](https://github.com/akanametov/yolov8-face))
3. Reidentification ([deepface](https://github.com/serengil/deepface))
4. Speech enhancement ([MDXNet](https://huggingface.co/freyza/kopirekcover/blob/main/MDXNet.py))
5. Speakers transcriptions and diarization ([whisperX](https://github.com/m-bain/whisperX))
6. Text translation ([googletrans](https://pypi.org/project/googletrans/))
7. Voice cloning ([TTS](https://github.com/coqui-ai/TTS))
8. Lip sync ([lipsync](https://github.com/mowshon/lipsync))
9. Face restoration ([GFPGAN](https://github.com/TencentARC/GFPGAN))
10. [Need to fix] Search for talking faces, determining what this person is saying
## Translation results 🥺
Note that this example was created without GFPGAN usage!
| Destination language | Source video | Output video |
| :---: | :---: | :---: |
|🇷🇺 (Russian) | [](https://youtu.be/eGFLPAQAC2Y) | [](https://youtu.be/L2YTmfIr7aI) |
## Contributors 🫵🏻