Meta has created an open-source speech AI that can recognize over 4,000 spoken languages and produce speech (text-to-speech) in over 1,100. The company hopes that the project will help preserve language diversity and encourage researchers to build on its foundation.
Speech recognition and text-to-speech models typically require training on thousands of hours of audio with accompanying transcription labels. However, for languages that aren't widely used in industrialized nations, this data simply does not exist. Meta used an unconventional approach to collecting audio data: tapping into audio recordings of translated religious texts.
By incorporating unlabeled recordings of the Bible and similar texts, Meta's researchers increased the model's available languages to over 4,000. The company cautions that its new models aren't perfect, and that there is some risk that the speech-to-text model may mistranscribe select words or phrases. However, Meta believes that the benefits of the project outweigh the risks.
Meta hopes that the open-source release of MMS will help reverse the trend of technology dwindling the world's languages to the 100 or fewer most often supported by Big Tech. The company envisions a world where assistive technology, TTS, and even VR / AR tech allow everyone to speak and learn in their native tongues.