Nvidia Joins The Speech AI Race, Joining Meta And Google

Join us on November 9 to learn how to successfully innovate and achieve efficiency by upscaling and scaling citizen developers at the Low-Code/No-Code Summit. Register here.

At Nvidia’s Speech AI Summit, the company announced its new Artificial Intelligence (AI) speech ecosystem, which it developed in collaboration with Common Voice of Mozilla. The ecosystem focuses on developing crowdsourced multilingual speech corpus and open-source pre-trained models. Nvidia and Mozilla Common Voice want to accelerate the growth of automatic speech recognition models that work universally for every language speaker worldwide.

Nvidia found that standard voice assistants, such as Amazon Alexa and Google Home, support less than 1% of the world’s spoken languages. To address this issue, the company aims to improve linguistic inclusion in speech AI and expand the availability of speech data for global and resource-poor languages.

Nvidia is joining a race that both Meta and Google are already running: both companies recently released speech AI models to facilitate communication between people who speak different languages. Google’s speech-to-speech AI translation model, Translation Hub, can translate a large number of documents into many different languages. Google also just announced that it is building a universal speech translator trained in more than 400 languages, claiming it is “the largest language model coverage seen in a speech model today”.

At the same time, Meta AI’s Universal Speech Translator (UST) project is helping to create AI systems that enable real-time speech-to-speech translation in all languages, even those that are spoken but infrequently written.

Event

Top with little code/no code

Learn how tobuild, scale, and manage low-code programs in a simple way that makes it all a success. Nov 9register your free pass today.

An ecosystem for global language users

According to Nvidia, linguistic inclusion for speech AI has extensive data health benefits, such as helping AI models understand speaker diversity and a spectrum of noise profiles. The new speech AI ecosystem helps developers build, maintain and improve the speech AI models and datasets for linguistic inclusion, usability and experience. Users can train their models on Mozilla Common Voice datasets and then offer those pre-trained models as high-performance automatic speech recognition architectures. Then other organizations and individuals around the world can adapt and use those architectures to build their speech AI applications.

“Demographic diversity is key to capturing language diversity,” said Caroline de Brito Gottlieb, product manager at Nvidia. “There are several vital factors that influence speech variation, such as underserved dialects, sociolects, pidgins, and accents. Through this partnership, we aim to create a dataset ecosystem that helps communities build speech datasets and models for any language or context.”

The Mozilla Common Voice platform currently supports 100 languages, with 24,000 hours of voice data from 500,000 contributors worldwide. The latest version of the Common Voice dataset also includes six new languages - Tigre, Taiwanese (Minnan), Meadow Mari, Bengali, Toki Pona, and Cantonese, as well as more voice data from female speakers.

Through the Mozilla Common Voice platform, users can donate their audio datasets by recording sentences as short speech clips, which Mozilla validates to ensure the quality of the dataset upon submission.

“The speech AI ecosystem focuses extensively not only on the diversity of languages, but also on accents and noise profiles that different language speakers have around the world,” Siddharth Sharma, head of product marketing, AI and deep learning at Nvidia, told VentureBeat. . “This has been our unique focus at Nvidia and we have created a solution that can be customized for every aspect of the speech AI model pipeline.”

Nvidia’s Current Speech AI Deployments

The company develops speech AI for various use cases, such as automatic speech recognition (ASR), artificial speech translation (AST) and text-to-speech. Nvidia Riva, part of the Nvidia AI platform, provides state-of-the-art GPU-optimized workflows for building and deploying fully customizable, real-time AI pipelines for applications such as contact center agent assistance, virtual assistants, digital avatars, brand voices, and transcription of video conferencing. Applications developed through Riva can be deployed in all cloud types and data centers, on the edge or on embedded devices.

NCS, a multinational company and a transportation technology partner of the government of Singapore, has adapted Nvidia’s Riva FastPitch model and built its own text-to-speech engine for English-Singapore using the voice data of local speakers. NCS recently designed Breezean app for local drivers that translates languages including Mandarin, Hokkien, Malay and Tamil into Singaporean English with the same clarity and expressiveness as a native Singaporean would speak them.

Mobile communications conglomerate T-Mobile also partnered with Nvidia to develop AI-based software for its customer experience centers that transcribes real-time customer conversations and recommends solutions to thousands working on the front lines. To create the software, T-Mobile Nvidia NeMo, an open-source framework for state-of-the-art conversational AI models, alongside Riva. These Nvidia tools allowed T-Mobile engineers to match ASR models to T-Mobile’s custom data sets and accurately interpret customer jargon in noisy environments.

Nvidia’s Future Focus on Speech AI

Sharma says Nvidia wants to imprint current developments of AST and next-gen speech AI into real-time metaverse use cases.

“Today we are limited to offering slow translations from one language to another, and those translations have to go through text,” he said. “But in the future, you can have people in the metaverse in so many different languages that can all be translated instantly with each other,” he said.

“The next step,” he added, “is to develop systems that enable fluid interactions with people around the world through speech recognition for all languages and real-time text-to-speech.”

The mission of VentureBeat is a digital city square for tech decision makers to gain knowledge about transformative business technology and transactions. Discover our briefings.

Is Sal Vulcano Married? Explore relationship and dating history

Jonathan Hillstrand Net Worth, Wife, Daughter, Weight Loss

Who is Ariel Contreras from ‘Hell’s Kitchen’ anyway? Biography

Who is Kaydon Boebert? Bio, parents, siblings, age, relationship

MSC’s Explora Journeys Makes Its Maiden Voyage from Copenhagen to Reykjavik

Nvidia joins the speech AI race, joining Meta and Google

Event

An ecosystem for global language users

Nvidia’s Current Speech AI Deployments

Nvidia’s Future Focus on Speech AI

Latest news

1xbet Зеркало Букмекерской Конторы 1хбет На следующий ️ Вход и Сайт Прямо тольк

Mostbet Pakistan ᐉ Online Casino Review Official Website

Casino Pin Up Pin-up Casino Resmi Sitesi Türkiye Proloq Ve Kayıt Çevrimiçi

Играть В Авиатора: Самолетик Pin Up

Pin Up 306 Casino əvvəl Qeydiyyat, Bonuslar, Yukl The National Investo

Must read

You might also likeRELATED
Recommended to you

POPULAR POSTS

Why Managed Discovery and Response (MDR) adoption is growing among small...

What Uber’s data breach reveals about social engineering

Growfin’s AI-based cash collection SaaS continues to expand into the US...

POPULAR CATEGORY

Nvidia joins the speech AI race, joining Meta and Google

Event

An ecosystem for global language users

Nvidia’s Current Speech AI Deployments

Nvidia’s Future Focus on Speech AI

Latest news

Must read

You might also likeRELATEDRecommended to you

POPULAR POSTS

POPULAR CATEGORY

You might also likeRELATED
Recommended to you