# HeyGenClone **Repository Path**: blueshh/HeyGenClone ## Basic Information - **Project Name**: HeyGenClone - **Description**: myHeygen依赖 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-02 - **Last Updated**: 2024-01-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

HeyGenClone

Welcome to the HeyGenClone, an open-source analogue of the HeyGen system.

I am a developer from Moscow 🇷🇺 who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!

Currently, translation support is enabled only from English 🇬🇧!

## Installation 🥸 - Clone this repo - Install [conda](https://conda.io/projects/conda/en/latest/user-guide/install/) - Create environment with Python 3.10 (for macOS refer to [link](https://www.mrdbourke.com/setup-apple-m1-pro-and-m1-max-for-machine-learning-and-data-science/)) - Activate environment - Install requirements: ``` cd path_to_project sh install.sh ``` - In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit [speaker-diarization](https://hf.co/pyannote/speaker-diarization), [segmentation](https://hf.co/pyannote/segmentation) and accept user conditions - Download weights from [drive](https://drive.google.com/file/d/1dYy24q_67TmVuv_PbChe2t1zpNYJci1J/view?usp=sharing), unzip downloaded file into weights folder - Install [ffmpeg](https://ffmpeg.org/) ## Configurations (config.json) 🧙‍♂️ | Key | Description | | :---: | :---: | | DET_TRESH | Face detection treshtold [0.0:1.0] | | DIST_TRESH | Face embeddings distance treshtold [0.0:1.0] | | HF_TOKEN | Your HuggingFace token (see [Installation](https://github.com/BrasD99/HeyGenClone/tree/main#installation)) | | USE_ENHANCER | Do we need to improve faces using GFPGAN? | | ADD_SUBTITLES | Subtitles in the output video | ## Supported languages 🙂 English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko) ## Usage 🤩 - Activate your environment: ``` conda activate your_env_name ``` - Сd to project path: ``` cd path_to_project ``` At the root of the project there is a translate script that translates the video you set. - video_filename - the filename of your input video (.mp4) - output_language - the language to be translated into. Provided [here](https://github.com/BrasD99/HeyGenClone#supported-languages-) (you can also find it in my [code](https://github.com/BrasD99/HeyGenClone/blob/main/core/mapper.py)) - output_filename - the filename of output video (.mp4) ``` python translate.py video_filename output_language -o output_filename ``` I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Сurrently it works for videos with one person. - voice_filename - the filename of your speech (.wav) - video_filename - the filename of your input video (.mp4) - output_filename - the filename of output video (.mp4) ``` python speech_changer.py voice_filename video_filename -o output_filename ``` ## How it works 😱 1. Detecting scenes ([PySceneDetect](https://github.com/Breakthrough/PySceneDetect)) 2. Face detection ([yolov8-face](https://github.com/akanametov/yolov8-face)) 3. Reidentification ([deepface](https://github.com/serengil/deepface)) 4. Speech enhancement ([MDXNet](https://huggingface.co/freyza/kopirekcover/blob/main/MDXNet.py)) 5. Speakers transcriptions and diarization ([whisperX](https://github.com/m-bain/whisperX)) 6. Text translation ([googletrans](https://pypi.org/project/googletrans/)) 7. Voice cloning ([TTS](https://github.com/coqui-ai/TTS)) 8. Lip sync ([lipsync](https://github.com/mowshon/lipsync)) 9. Face restoration ([GFPGAN](https://github.com/TencentARC/GFPGAN)) 10. [Need to fix] Search for talking faces, determining what this person is saying ## Translation results 🥺 Note that this example was created without GFPGAN usage! | Destination language | Source video | Output video | | :---: | :---: | :---: | |🇷🇺 (Russian) | [![Watch the video](https://i.ibb.co/KD2KKnj/en.jpg)](https://youtu.be/eGFLPAQAC2Y) | [![Watch the video](https://i.ibb.co/cbwCy8F/ru.jpg)](https://youtu.be/L2YTmfIr7aI) | ## Contributors 🫵🏻 ## To-Do List 🤷🏼‍♂️ - [ ] Fully GPU support - [ ] Multithreading support (optimizations) - [ ] Detecting talking faces (improvement) ## Other 🤘🏻 - Tested on macOS - :warning: The project is under development!