~Realtime speech2speech?


My wife was doing some DMing, and one of her players gave her some feedback about her presentation being less than “enthusiastic”. This got me thinking, could AI be used to make her always sound “on” when she’s DMing?

The idea would be to get a clone of her voice when she’s fresh and at her best, and then use AI to output content from that model in real time when she’s speaking.

A little research shows that RVC seems to be a good choice [1] [2]. Then also, elevenlabs seems to have some good tools [3] [4].

I was wondering if anyone could share their opinion or observations about doing real time speech2speech?

[1] https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

[2] https://github.com/gitmylo/audio-webui?tab=readme-ov-file

[3] https://www.youtube.com/watch?v=0UVppC0Ihjk

[4] https://www.youtube.com/watch?v=jXzB-xUzpM8

