Technology Nvidia joins the speech AI race, joining Meta and...

Nvidia joins the speech AI race, joining Meta and Google

-

Join us on November 9 to learn how to successfully innovate and achieve efficiency by upscaling and scaling citizen developers at the Low-Code/No-Code Summit. Register here.


At Nvidia’s Speech AI Summit, the company announced its new Artificial Intelligence (AI) speech ecosystem, which it developed in collaboration with Common Voice of Mozilla. The ecosystem focuses on developing crowdsourced multilingual speech corpus and open-source pre-trained models. Nvidia and Mozilla Common Voice want to accelerate the growth of automatic speech recognition models that work universally for every language speaker worldwide.

Nvidia found that standard voice assistants, such as Amazon Alexa and Google Home, support less than 1% of the world’s spoken languages. To address this issue, the company aims to improve linguistic inclusion in speech AI and expand the availability of speech data for global and resource-poor languages.

Nvidia is joining a race that both Meta and Google are already running: both companies recently released speech AI models to facilitate communication between people who speak different languages. Google’s speech-to-speech AI translation model, Translation Hub, can translate a large number of documents into many different languages. Google also just announced that it is building a universal speech translator trained in more than 400 languages, claiming it is “the largest language model coverage seen in a speech model today”.

At the same time, Meta AI’s Universal Speech Translator (UST) project is helping to create AI systems that enable real-time speech-to-speech translation in all languages, even those that are spoken but infrequently written.

Event

Top with little code/no code

Learn how tobuild, scale, and manage low-code programs in a simple way that makes it all a success. Nov 9register your free pass today.

Register here

An ecosystem for global language users

According to Nvidia, linguistic inclusion for speech AI has extensive data health benefits, such as helping AI models understand speaker diversity and a spectrum of noise profiles. The new speech AI ecosystem helps developers build, maintain and improve the speech AI models and datasets for linguistic inclusion, usability and experience. Users can train their models on Mozilla Common Voice datasets and then offer those pre-trained models as high-performance automatic speech recognition architectures. Then other organizations and individuals around the world can adapt and use those architectures to build their speech AI applications.

“Demographic diversity is key to capturing language diversity,” said Caroline de Brito Gottlieb, product manager at Nvidia. “There are several vital factors that influence speech variation, such as underserved dialects, sociolects, pidgins, and accents. Through this partnership, we aim to create a dataset ecosystem that helps communities build speech datasets and models for any language or context.”

The Mozilla Common Voice platform currently supports 100 languages, with 24,000 hours of voice data from 500,000 contributors worldwide. The latest version of the Common Voice dataset also includes six new languages ​​- Tigre, Taiwanese (Minnan), Meadow Mari, Bengali, Toki Pona, and Cantonese, as well as more voice data from female speakers.

Through the Mozilla Common Voice platform, users can donate their audio datasets by recording sentences as short speech clips, which Mozilla validates to ensure the quality of the dataset upon submission.

Image source: Mozilla Common Voice.

“The speech AI ecosystem focuses extensively not only on the diversity of languages, but also on accents and noise profiles that different language speakers have around the world,” Siddharth Sharma, head of product marketing, AI and deep learning at Nvidia, told VentureBeat. . “This has been our unique focus at Nvidia and we have created a solution that can be customized for every aspect of the speech AI model pipeline.”

Nvidia’s Current Speech AI Deployments

The company develops speech AI for various use cases, such as automatic speech recognition (ASR), artificial speech translation (AST) and text-to-speech. Nvidia Riva, part of the Nvidia AI platform, provides state-of-the-art GPU-optimized workflows for building and deploying fully customizable, real-time AI pipelines for applications such as contact center agent assistance, virtual assistants, digital avatars, brand voices, and transcription of video conferencing. Applications developed through Riva can be deployed in all cloud types and data centers, on the edge or on embedded devices.

NCS, a multinational company and a transportation technology partner of the government of Singapore, has adapted Nvidia’s Riva FastPitch model and built its own text-to-speech engine for English-Singapore using the voice data of local speakers. NCS recently designed Breezean app for local drivers that translates languages ​​including Mandarin, Hokkien, Malay and Tamil into Singaporean English with the same clarity and expressiveness as a native Singaporean would speak them.

Mobile communications conglomerate T-Mobile also partnered with Nvidia to develop AI-based software for its customer experience centers that transcribes real-time customer conversations and recommends solutions to thousands working on the front lines. To create the software, T-Mobile Nvidia NeMo, an open-source framework for state-of-the-art conversational AI models, alongside Riva. These Nvidia tools allowed T-Mobile engineers to match ASR models to T-Mobile’s custom data sets and accurately interpret customer jargon in noisy environments.

Nvidia’s Future Focus on Speech AI

Sharma says Nvidia wants to imprint current developments of AST and next-gen speech AI into real-time metaverse use cases.

“Today we are limited to offering slow translations from one language to another, and those translations have to go through text,” he said. “But in the future, you can have people in the metaverse in so many different languages ​​that can all be translated instantly with each other,” he said.

“The next step,” he added, “is to develop systems that enable fluid interactions with people around the world through speech recognition for all languages ​​and real-time text-to-speech.”

The mission of VentureBeat is a digital city square for tech decision makers to gain knowledge about transformative business technology and transactions. Discover our briefings.

Shreya Christinahttp://ukbusinessupdates.com
Shreya has been with ukbusinessupdates.com for 3 years, writing copy for client websites, blog posts, EDMs and other mediums to engage readers and encourage action. By collaborating with clients, our SEO manager and the wider ukbusinessupdates.com team, Shreya seeks to understand an audience before creating memorable, persuasive copy.

Latest news

1xbet Зеркало Букмекерской Конторы 1хбет На следующий ️ Вход и Сайт Прямо тольк

1xbet Зеркало Букмекерской Конторы 1хбет На следующий ️ Вход и Сайт Прямо только1xbet Зеркало на Сегодня Рабочий официальный Сайт...

Mostbet Pakistan ᐉ Online Casino Review Official Website

Join us to dive into an immersive world of top-tier gaming, tailored for the Kenyan audience, where fun and...

Casino Pin Up Pin-up Casino Resmi Sitesi Türkiye Proloq Ve Kayıt Çevrimiçi

ContentPin Up Nə Say Onlayn Kazino Təklif Edir?Pin Up Casino-da Pul Çıxarmaq Nə Miqdar Müddət Alır?Vəsaiti Kartadan Çıxarmaq üçün...

Играть В Авиатора: Самолетик Pin Up

ContentAviator: Son Qumar Oyunu Təcrübəsini AçınMobil Proqram Pin UpPin Up Aviator Nasıl Oynanır?Бонус За Регистрацию В Pin Up?Pin Up...

Pin Up 306 Casino əvvəl Qeydiyyat, Bonuslar, Yukl The National Investo

ContentDarajalarfoydalanuvchilar Pin UpCasino Pin-up Pin-up On Line Casino Resmi Sitesi Türkiye Başlanğıc Ve Kayıt ÇevrimiçPromosyon Və Qeydiyyatdan KeçməkAviator OyunuAviator...

Find Experts to Write My Paper for Me. Just Click a Button Even though you may have many...

Must read

You might also likeRELATED
Recommended to you