Meta is working on a new tool that leverages the power of generative AI, the underlying technology of viral chatbot ChatGPT. Dubbed Voicebox, the tool can be used to create speech with voice samples and simple text input. Meta also claims that Voicebox can filter out unwanted background noise from audio samples. However, unlike other generative AI tools like ChatGPT and Bard, or AI image generators like Dull-E or Midjourney, Voicebox remains unavailable to testers and may remain restricted for the time being. This is because Meta says that Voicebox can be misused and has a lot of potential risks.
Meta Voicebox, and how does it work?
In simple words, Voicebox is a speech-to-text generator along with some audio editing tools. However, Meta says that its AI tool is far more effective than its competitors because Voicebox can replicate intonation and pronunciation. Voicebox’s existing competition Val-e also lets users create text-to-speech samples with up to 3 seconds of recording. However, Meta claims that Voicebox output is up to 20 times faster with fewer errors.
Since Voicebox is not available to the public, the company explains its workings in a research paper and blog post. Meta says that Voicebox is built on a method called “flow matching” for converting text to speech. The model is said to handle complex and unpredictable relationships between text and speech. It also allows Voicebox to be trained on a larger and more diverse set of data, making it more powerful and flexible.
Currently, Voicebox can generate speech in English, French, German, Spanish, Polish and Portuguese. Meta states that the technology is “exciting” because it could help people communicate in a natural and authentic way “even if they don’t speak the same languages.”
As mentioned, Voicebox can also be used for audio editing. In a demo, Meta shows that the tool effectively filtered out the background noise of a dog barking from a sample. Similar audio filtering features are already present in Google Meet and Zoom.
Why is Meta Voicebox unavailable?
Meta says the company is “not making the Voicebox model or code publicly available at this time” due to “potential risks of misuse”. It further added, “While we believe it is important to be open with the AI community and share our research to advance the state of the art in AI, it is also necessary to strike the right balance between openness with responsibility. With these thoughts in mind, today we’re sharing audio samples and a research paper detailing our approach and the results we achieved.”