Technology Building inclusive NLP | Venture Beat

Building inclusive NLP | Venture Beat

-

View all on-demand sessions from the Intelligent Security Summit here.


Every day, millions of native English speakers enjoy the benefits of natural language processing (NLP) models.

But for African American Vernacular English (AAVE) speakers, technologies such as voice-guided GPS systems, digital assistants, and speech-to-text software are often problematic because large NLP models are often unable to understand or generate words in AAVE. Even worse, models are often trained on data scraped from the internet and are prone to absorbing the racial biases and stereotyped associations that are rampant online.

When these biased models are used by companies to make important decisions, AAVE speakers can be unfairly banned from social media, unfairly denied access to housing or loan opportunities, or treated unfairly by law enforcement or judicial systems.

For the past 18 months, machine learning (ML) specialist Jazmia Henry has focused on finding a way to responsibly incorporate AAVE into language models. As a fellow at the Stanford Institute for Human-Centered Artificial Intelligence (HI) and the Center for Comparative Studies in Race and Ethnicity (CCSRE), she has one open source corpora of more than 141,000 AAVE words to help researchers and builders design models that are both inclusive and less prone to bias.

Event

Intelligent Security Summit on demand

Learn the critical role of AI and ML in cybersecurity and industry-specific case studies. Check out on-demand sessions today.

Look here

“My hope with this project is that social and computational linguists, anthropologists, computer scientists, social scientists and other researchers will poke and poke these corpora, research with it, wrestle with it and test its limits so that we can grow this into a faithful representation of AAVE and provide algorithmic feedback and insight into our possible next steps,” said Henry.

In this interview, she describes the early obstacles in developing this database, the potential to help computational linguists understand the origins of AAVE, and her plans after Stanford.

How do you describe African American English in the vernacular?

To me, AAVE is a language of perseverance and upliftment. It is the result of African languages ​​thought to have been lost during the slave trade migration, which have been absorbed into English to create a new language used by the descendants of those African peoples.

How did you become interested in incorporating AAVE into NLP models?

As a child, both my parents occasionally spoke their mother tongue. For my Caribbean father, it was Jamaican patois, and for my mother, it was Gullah Geechee, found in the coastal regions of the Carolinas and Georgia. Each language was a creole language, which is a new language created by mixing different languages.

Everyone seemed to understand that my parents spoke a different language, and no one questioned their intelligence. But when I saw people in my community speak AAVE, which I believe is another Creole language, I could see there was a shame and stigma associated with it – the feeling that if we used this language outside, we would be judged as less intelligent. When I got into data science, I wondered what would happen if I tried to collect data on AAVE and put it into NLP models so that we could really understand it and improve the performance of these models.

How has your project evolved and what obstacles have you encountered?

There were many obstacles and in the end I had to change my goal. AAVE evolves much faster than many languages ​​and often turns standardized English on its head, giving words completely new meanings. For example, the word “crazy” is often defined as “angry.” However, in AAVE it is often used to mean ‘very’, as in ‘crazy funny’.

AAVE can also be largely determined by the situation, the speaker and the tone used, things that language processing models do not take into account. I finally decided to create a corpus of AAVE, which is divided into four collections. The lyrics collection contains the words of 15,000 songs from 105 artists ranging from Etta James and Muddy Waters to Lil Baby and DaBaby.

The Leadership Collection features speeches from individuals of influence ranging from Fredrick Douglass and Sojourner Truth to Martin Luther King and Ketanji Brown Jackson. The hardest thing to put together was the book collection, because African Americans are severely underrepresented in the literary canon, but I included works from historical college blackbook archive collections.

Finally, the social media collection is the most robust and diverse and includes video transcripts, blog posts, and 15,000 tweets, all collected from black opinion leaders.

How do you hope your project will be used?

I know the corpora are starting to be used, but I don’t know yet by whom or for what purpose. I hope this preliminary work inspires researchers to enter this space, question it and push it forward to ensure that AAVE is represented in the languages ​​used in NLP. Social and computational linguists may be able to use this to help determine whether AAVE is in fact its own language or dialect and to look for connections between it and other African languages, especially languages ​​not recorded or preserved in Western history.

Growing up, we learned what was taken from our enslaved ancestors and from their descendants. AAVE could be proof that not everything has been taken away and that we were able to keep a part of who we were in the way we communicate with each other. That knowledge has the potential to dispel shame and inject pride. If I say, “What is it, my brother?” I am not unintelligent; I am strategic and summon our ancestors with that conversation.

Not only does it not reflect the wider community, it actively discriminates against that community. Large language models that have difficulty understanding or generating words in AAVE are more likely to exacerbate stereotypes about black people in general, and these biased associations are codified within these models. When commercialized, these models – and their biases – can lead companies to make unfair decisions that affect the lives of AAVE speakers. This can result in anything from individuals having their social media disproportionately edited or removed from platforms to discrimination in areas such as housing, banking, and law enforcement and judicial systems.

What should NLP developers think about when building tools?

There have been some popular NLP models that contain a lot of bias. Companies are working to scale back these problematic models, but that is often followed by a focus on risk mitigation rather than bias mitigation. Instead of trying to find solutions, sometimes companies will say, “Let’s not touch AAVE or anything related to Blackness again, because we didn’t get it right the first time.”

Instead, they should be asking how to do it correctly now. Now is the time to build models that are better, that improve processes, and that invent new ways of working with languages ​​like AAVE so that larger companies don’t continue to perpetuate damage.

What are your plans when you leave Stanford?

I am starting a new job at Microsoft, where I will be working as a senior applied engineer for the autonomous systems team Project Bonsai. We’re increasing the possibilities for deep reinforcement learning with something we call “machine teaching,” which essentially teaches machines how to perform tasks that can make people more productive, improve safety, and enable autonomous decision-making using AI. This work gives me the opportunity to improve people’s lives, and I am so grateful for the opportunity.

Beth Jensen is a contributing writer for the Stanford Institute for Human-Centered AI.

This story originally appeared on Hai.stanford.edu. Copyright 2023

Data decision makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

To read about advanced ideas and up-to-date information, best practices and the future of data and data technology, join DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

Shreya Christinahttp://ukbusinessupdates.com
Shreya has been with ukbusinessupdates.com for 3 years, writing copy for client websites, blog posts, EDMs and other mediums to engage readers and encourage action. By collaborating with clients, our SEO manager and the wider ukbusinessupdates.com team, Shreya seeks to understand an audience before creating memorable, persuasive copy.

Latest news

1xbet Зеркало Букмекерской Конторы 1хбет На следующий ️ Вход и Сайт Прямо тольк

1xbet Зеркало Букмекерской Конторы 1хбет На следующий ️ Вход и Сайт Прямо только1xbet Зеркало на Сегодня Рабочий официальный Сайт...

Mostbet Pakistan ᐉ Online Casino Review Official Website

Join us to dive into an immersive world of top-tier gaming, tailored for the Kenyan audience, where fun and...

Casino Pin Up Pin-up Casino Resmi Sitesi Türkiye Proloq Ve Kayıt Çevrimiçi

ContentPin Up Nə Say Onlayn Kazino Təklif Edir?Pin Up Casino-da Pul Çıxarmaq Nə Miqdar Müddət Alır?Vəsaiti Kartadan Çıxarmaq üçün...

Играть В Авиатора: Самолетик Pin Up

ContentAviator: Son Qumar Oyunu Təcrübəsini AçınMobil Proqram Pin UpPin Up Aviator Nasıl Oynanır?Бонус За Регистрацию В Pin Up?Pin Up...

Pin Up 306 Casino əvvəl Qeydiyyat, Bonuslar, Yukl The National Investo

ContentDarajalarfoydalanuvchilar Pin UpCasino Pin-up Pin-up On Line Casino Resmi Sitesi Türkiye Başlanğıc Ve Kayıt ÇevrimiçPromosyon Və Qeydiyyatdan KeçməkAviator OyunuAviator...

Find Experts to Write My Paper for Me. Just Click a Button Even though you may have many...

Must read

You might also likeRELATED
Recommended to you