Meta announces Voicebox, a generative model for multiple voice synthesis tasks

    Meta, the company behind Facebook, has unveiled a groundbreaking generative AI model called ‘Voicebox’ that has the potential to revolutionize speech generation. In a blog post, Meta announced that Voicebox is the first model capable of generalizing to speech-generation tasks with exceptional performance, even without specific training for those tasks.

    Unlike traditional models that generate images or text, Voicebox specializes in producing high-quality audio clips. It can generate speech in multiple styles, either from scratch or by modifying provided samples. The model supports speech synthesis in six languages: English, French, German, Spanish, Polish, and Portuguese. Additionally, Voicebox offers features such as noise removal, content editing, style conversion, and diverse sample generation.

    What sets Voicebox apart is its unique learning approach. Instead of relying on autoregressive models, Voicebox learns directly from raw audio data and accompanying transcriptions. This enables the model to modify any part of a given sample, not just the end, resulting in enhanced flexibility and versatility.

    Meta explains that Voicebox is trained to predict a speech segment when given the surrounding speech and its corresponding transcript. Once the model grasps the ability to fill in speech based on context, it can be applied to a wide range of speech generation tasks, allowing it to generate specific portions of an audio recording without reproducing the entire recording.

    Thanks to its versatility, Voicebox excels in various applications, including in-context text-to-speech synthesis, cross-lingual style transfer, speech denoising and editing, and diverse speech sampling. The model’s performance and adaptability offer new possibilities for creative audio generation and advanced speech manipulation.

    Meta’s Voicebox represents a significant advancement in the field of speech generation, introducing a powerful AI model capable of producing high-quality audio clips and performing various speech-related tasks with exceptional results. As AI technology continues to evolve, Voicebox could open doors to innovative applications in voice-assisted technologies, entertainment, and more.


