New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

heygen video translation #8

Open

echokk11 opened this issue Mar 7, 2024 · 1 comment

echokk11 commented Mar 7, 2024

假如再大胆一点

whisper解决语音到字幕的问题
LLMs(chatgpt,google translate)解决多国语言翻译问题
MockingBird或者so-vits-svc-fork训练原配角色音色(声纹)
根据分析出的文本时间轴，利用ffmpeg分割不同音色的视频到片段，同时用训练好的原配角色音色按照翻译后的文本生成音轨
(可选)再用GeneFace++或者Wav2Lip对应的口型矫正
最后合并回去(ffmpeg)

这个是不是就是heygen video translation的大致实现思路，当然我是一个rookie，真的过程想必远比这个复杂，这里最大的难点是，如何识别出不同的声音的前后时间轴，中间还有相关的去背景音，识别误差校准等很多问题

The text was updated successfully, but these errors were encountered:

Owner

Chenyme commented Mar 7, 2024

您好！非常感谢你的建议，我会尝试去复现这个流程，但由于学业压力我不得不会放缓项目进展。此外，内部消息说国内剪映在卷数字人，年底应该会有很好的开源方案。
再次谢谢你的建议～

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment