Busted

Many people think of AI as objective and neutral, something that not only makes our lives easier, but also helps us to eliminate the biases that plague human cognition and decision-making. While it’s true AI can do a lot for us, it’s a myth that AI is bias-free. In fact, AI can amplify the bias and perpetuate the inequality that is already rampant in our society. In this episode, we’ll explore how and why AI isn’t as fair as we’d like to believe, the consequences of bias in AI, and what responsible and ethical AI could look like.    

GATE’s Busted podcast is made possible by generous support from BMO.  

Featured Guests:  
Allison Cohen, Senior Manager, Applied Projects, Mila  
Dr. James Zou, Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering, Stanford University 
Produced by: Carmina Ravanera and Dr. Sonia Kang 
Edited by: Ian Gormely 

What is Busted?

Does achieving gender equality only benefit women? Are gender quotas thwarting meritocracy? Are women more risk averse than men? If you think you know the answers to these questions, then think again! Busted is an audio podcast series that busts prominent myths surrounding gender and the economy by teaming up with leading experts in the field. We uncover the origins of each myth and give you the tools to bust each myth yourself!  

Busted is a GATE audio series production from the Institute for Gender and the Economy.

Allison Cohen:

A lot of people default to believing that artificial intelligence is this incredible technology independent of human thought, independent in some ways of human bias, of human influence. And so I think that's why it's so important to debunk the myth that AI is objective because it will allow us to be more critical about where AI should be used and maybe where it should not be used.

Carmina Ravanera:

Artificial intelligence, especially generative AI, is everywhere these days. Many people think of AI as objective and neutral, something that not only makes our lives easier, but also helps us to eliminate the biases that plague human cognition and decision making. While it's true AI can do a lot for us, whether at work or school or for entertainment, it's a myth that AI is bias free. In fact, AI can amplify the bias and perpetuate the inequality that is already rampant in our society. In this episode, we'll be busting this myth.

Carmina Ravanera:

We'll explore how and why AI isn't as fair as we'd like to believe, the consequences of bias in AI, and what responsible and ethical AI could look like. I'm Carmina Ravanera, senior research associate at GATE.

Dr. Sonia Kang:

And I'm Dr. Sonia Kang, academic director at GATE. So it seems like AI is on everyone's minds lately, and it's changing so much about human life, how we interact with each other, how we work and learn, and how we make decisions. And that's only a few examples. According to a recent KPMG survey, 22% of Canadians report using generative AI for work. And of that group, 61% use it multiple times a week.

Carmina Ravanera:

Right. It has so many different uses. We even use AI to create the transcripts for this podcast after we've recorded them, which I can say has saved me many hours. It can definitely boost efficiency.

Dr. Sonia Kang:

But there are also a lot of cases of AI demonstrating bias, even when people aren't aware of it. One example I've read about is how Gen AI reinforces existing gendered stereotypes, like by strongly associating science with men and the domestic sphere with women.

Carmina Ravanera:

So this is what we have to be careful about, especially because AI is being used so much now. There's a misconception that AI is just a neutral machine, spitting out facts and truths. But actually, AI is not necessarily objective at all, and it can and does reproduce inequality.

Dr. Sonia Kang:

So who did you talk to to help bust this myth?

Carmina Ravanera:

I talked to Dr. James Zou, who is an associate professor of biomedical data science and by courtesy of computer science and electrical engineering at Stanford University. One thing he made clear is that we always have to remember that AI is based on data and that those data aren't perfect.

Dr. James Zou:

I think the most important thing to know about AI is that it's really an algorithm that's learning from data. Right? So, usually, when we develop an AI algorithm, we give it some data, and then we ask it to make predictions based on that data. Right? So for example, if I take something like ChatGPT or large language model, right, so we give it a lot of corpus of texts that people have written before.

Dr. James Zou:

And how we train a model like ChatGPT is that we ask it to predict how people would like to to write to the next sentence or the next paragraph. Right? And once we understand that the algorithms are intrinsically, you know, learning from data, it's easy to see how potentially different kinds of biases can creep into this algorithm. Right? So for example, if I train these algorithms on text or other data that have some, let's say, historical biases or different issues or limitations in data, then the algorithm can often capture and, recapitulate similar kinds of biases.

Carmina Ravanera:

Without good representative data, AI is not going to be bias free. So on that level, it's definitely not neutral. But there are lots of other reasons that AI is not objective. I talked to Allison Cohen, who is the senior applied AI projects manager at Mila, a Quebec based AI research institute.

Allison Cohen:

For me, what objective means is that there is an independent truth to something, independent from time, independent from social convention, independent from financial interests. And I would argue that AI is actually a product of time, social conventions, and financial interests. I'll start maybe with social conventions. It is the norms that we have. It is the social structure that we have that influences who is most commonly at the table in developing AI products, and the role that they have to play in influencing how these products are actually designed.

Allison Cohen:

It's also our social norms about what data we should be considering private and what data we shouldn't be considering private that dictates what data these models end up having access to and then basing their decisions off of. It's also our social conventions currently around ethics, morality, philosophy that influence the degree to which our models, our tools should reflect engagement with those concepts as well. For example, we have chatbots now that can give you a moral philosophical response to a question about morality and ethics. And we haven't yet decided as a society that this is beneficial or or potentially quite harmful. So these are some of the norms.

Allison Cohen:

And then there's, of course, the financial interests. AI has the potential to be incredibly lucrative. It already has been. And so the financial upside of artificial intelligence dictates what types of problems AI is currently being used to solve and and specifically the problems that there is the most money in solving.

Dr. Sonia Kang:

So AI is not separate from our society. It's basically a mirror of it. The data it's trained on, the social norms it follows, how it's regulated, and what AI projects get funded are all determined based on existing social, economic, and political dynamics. When you think about it like that, it really calls into question the idea that AI is a machine that we can rely on for objectivity.

Carmina Ravanera:

Yeah. And I know you brought up that example earlier of how generative AI reproduces gendered stereotypes. James and Allison talked to me about that as well. They both had some interesting examples from their own work of how AI can regurgitate and proliferate bias and stereotypes, which in turn can reinforce inequality.

Dr. James Zou:

Yeah. I can give, maybe 2 concrete examples of some of the research that we've done, and they both actually relate to generative AI, which is a topic that a lot of people are very interested in nowadays. So a couple years ago with my student, Abubakar, so we wanted to study how like language models, right, can capture different kinds of stereotypes, especially some harmful, harmful stereotypes. And what Abubakar found is that, that's basically, if you take, like, an earlier version of ChatGPT, which is called GPT 3, that's a sort of the predecessor to to the current ChatGPT. And if you ask that model to describe if, generate stories or about specific groups like, say, Muslims, right, the model would often actually generate stories that have violence or some violent content because it has sort of this negative stereotype associating Muslims with violence.

Dr. James Zou:

And again, some of that stereotype can be also be attributed potentially to the training data that is used to train these language models. So in that case, right, so even if you write sort of a innocuous prompt to the model, sometimes it would still generate violent associations. Right? And that you can imagine being potentially quite harmful, because it's perpetuating these negative stereotypes about the Muslim community. Now since we released that research, about 2 or 3 years ago, then there's actually gotten a lot of attention.

Dr. James Zou:

And companies like OpenAI and other companies have been trying to build additional safeguards, right, guardrails against these kinds of, biases and stereotypes in language models. A second example is, looking at these image generation models. Right? So another big popular kind of generative AI is, you type in some sort of text and the model would generate images based on your text prompts. Right?

Dr. James Zou:

So there are models like DALL E or Stapleton, which are quite popular. People use those to generate a lot of images. And in the research that we published, last year, we found that if, we ask these models, for example, to generate images of, like, software programmers or developers, right, and then the model often would have fair amount of, like, gender stereotypes. So for example, it would actually generate images of software programmers, developers, and almost a 100% of the generated images will be males. And this is like another example of where these kind of generative models can capture, in some cases, amplify biases as training data, right, and capture those stereotypes.

Dr. James Zou:

And also since we published our research, these companies are using these generative models like Open AI or Stable Diffusion. They've also tried to build in additional guardrails to reduce and mitigate some of those stereotypes.

Allison Cohen:

Another very popular example is with a tool built by Amazon to, look through applicants for a job for jobs that they were hiring for and realizing that the chatbot was biased against women. And even when they removed, for example, the applicant's name, the models were able to infer from other descriptions in the resume, things like, you know, involved with the women's volleyball team that this individual was a woman and seeing previous hiring data and that, you know, they they haven't hired as many women to men in the past, learned not to move forward with that kind of applicant or or suggest that they be interviewed in the next round. So, of course, the the implications of that bias is pretty significant, and there are plenty of examples of algorithms inferring information from demographics and providing a response that might be different, between 2 of the same people. So, of course, we've seen that in people's ability to get loans, predictions about recidivism rates, insurance, premiums that people get, access to quality health care. So these are areas that are pretty, important to people's quality of life.

Dr. Sonia Kang:

And these are just a couple of examples. One of the most famous research studies on biased AI is the Gender Shades Project led by Joy Buolamwini at the MIT Media Lab. These researchers evaluated 3 major facial recognition tools and found significant disparities in accuracy depending on gender and skin tone. Specifically, error rates were highest for dark skinned women, around 35%, and lowest for lighter skinned men, at only 0.8%. Of course, this algorithmic bias, as they call it, can have harmful consequences for racialized people, when that software is used in policing and surveillance, employment, and health care, to name just a few.

Carmina Ravanera:

Right. You bring up the important point that AI has and continues to be used to make decisions and predictions that have major impacts on people's lives even though there's a lot of evidence showing that at best, it's not always accurate and at worst that it can reinforce systemic biases and inequalities. Unfortunately, because people think that AI is more objective than humans, they're less likely to check for bias or mistakes. They take whatever AI is telling them as the truth.

Dr. Sonia Kang:

So what can organizations do to make sure that they're using AI in a way that doesn't reinforce inequality?

Carmina Ravanera:

Well, it's not just one thing. There are a lot of different steps, and one of the main things is increasing diversity, both in terms of the teams that are creating AI and in terms of the data that AI is being trained on. I spoke to Allison about why it's so important that teams developing AI are representative. She gave an example of a project where they were able to create a better product when the data labelers, the people who label the datasets that AI is trained on, were women.

Allison Cohen:

A project that we were building recently where we were trying to design algorithms that could detect and remove both misogyny and racism in written text. So imagine in emails, in social media data, what have you. And I was working with a team entirely comprised of computer scientists, and none of the members of the team besides myself were women. And none of the members of the team had direct experience with racism in Canada or North America more broadly. And those facts became abundantly clear in, for example, the annotation guidelines to be able to train models to detect racism and misogyny, unless you have a dataset that's already available on the Internet for you to use to train your models, you need to create a dataset from scratch.

Allison Cohen:

So in order to do that, you need examples of what racism and misogyny actually look like, and those examples need to be labeled as examples by people that are called annotators or data labelers. When I took a look at the instructions that were being given to our data labelers, it was very clear to me that we would likely have a pretty significant problem. It's when you think about gender and finding examples of misogyny in written text, it's something very nuanced. I mean, unless you are an expert in the field, it's hard, how how would you even describe why something is misogynistic and and the impact that that has and misogynistic for who and misogynistic when, and what about with that punctuation? Does that change the way that you interpret the sentence?

Allison Cohen:

And so the essence of misogyny, the essence of racism was clearly, and all of the challenges that come along with that, was not reflected in the annotation guidelines. And the people that we were hiring, if we would hire them, you know, the way that our annotators are traditionally hired, which is finding people online from all over the world who may have any, you know, gender or racial identity, it doesn't necessarily align with getting the best quality data. So we ended up actually having to restaff our entire team with women and scope down the project from racial and gender related bias to just gender related bias. That lack of diversity and then that introduction of new diversity where we had women from both Canada and America, specialists in gender studies and linguistics, and then working with expert annotators on the team completely changed the quality of, of course, the the data labeling instructions, but also the quality of the annotations that we got out and then the likelihood that the models would be performing well, in deployment. So there's a massive, massive impact on relationship between the people that are building these tools and the outcomes, of these tools.

Dr. Sonia Kang:

That's such a good example of how the people who are building the AI can directly influence how accurate or not accurate it ends up being. Representation definitely matters. As a quick note, when we're talking about accuracy with AI, it generally refers to how good it is at making the right predictions or decisions.

Carmina Ravanera:

And after team diversity, we also have the datasets themselves. James talked to me about creating guardrails for AI.

Dr. James Zou:

And I think when we think about guardrails for AI, it really needs to be across the entire development cycle for these AI algorithms. Because these AI models, as we saw there, become increasingly more complex. Right? And, there are different stages in these algorithms are developed, when where potential biases can can get into these algorithms. Right?

Dr. James Zou:

And I think current guardrails are often 2 types that people currently use in practice. Right? 1 is at the level of the training data. Basically, thinking about how do we curate higher quality or more diverse, more representative datasets that goes through training these models. And the second type would be after you train the model, right, some additional, what are called, like, post hoc guardrails that can be put on top of it to either audit the models, monitor them for potential issues and biases, and to detect those biases, then how to, maybe mitigate them.

Dr. James Zou:

Right? And I think actually increasingly, there's probably more work now on trying to do these more post hoc guardrails because it's actually quite a lot of these models are not publicly available models, where we don't know actually what data goes into training GPT 4 or going to training these large language models, which is why there's a lot more research that's done on these adding these safety guardrails afterward.

Dr. Sonia Kang:

Even though organizations may be more likely to put in these guardrails after they finish working on a project or creating a product, I think it would be more effective to do it in the initial planning stages. Wouldn't that be a better time to think about who would be affected by this AI and how it might help or hurt them?

Carmina Ravanera:

I agree. It'd be easier and more effective to mitigate harms that way rather than addressing them after the fact. But I guess the question is, how can teams effectively predict what these possible harms may be? Allison says that one way to do that is by making sure to include social sciences experts and communities who will be affected by the AI as part of the design and decision making process.

Allison Cohen:

I guess I wanna underscore the importance of working closely with social science experts, especially people who are who understand the domain that you're gonna be working in, who know best practices in working with other researchers and and data labelers, people who see that nuance. Because I think too often the from a computer science perspective, the algorithms will figure themselves out. The dataset will figure itself out, but there is so much nuance that you need to consider to make sure that the algorithms and the dataset does figure itself out. And so you will only unveil that degree of nuance by working with people who are embedded in that domain, and who know how to ethically source some of this work. I think the other thing that's important is it's not enough to just have those social scientists on the team.

Allison Cohen:

I think the role of a computer scientist in that context is to make sure that they are communicating very openly with those social science experts, giving them decision making power, having that be sort of a collaborative effort. Too often, it's the computer scientists that are being paid more, believed to be the real experts in the room, or will sort of move ahead regardless of whether the social scientists are on board or not. And so having that even power dynamic around the table, making sure that any assumptions are spelled out. So that's that's a lot of work, and I think that that's work that often gets, swept under the rug as being, you know, important to the quality of the the project outcomes, but it's it's work that really needs to be done. I also think that working directly with communities, particularly intersectional communities that will be using these tools is really important for designing products that align with the values of those communities, but also being accountable to those communities once the tool is deployed.

Allison Cohen:

And having members of those communities evaluate the model performance. These are all things that will help make sure that the product is performing as intended and does not create unintentional bias or inequality for any particular user group.

Carmina Ravanera:

And as part of that, datasets need to be more transparent. But many organizations aren't going to do that unless there's some incentive for them to do so. This is where recognition for representative datasets come in.

Dr. James Zou:

And going forward, I think one often, part that's that's understudied, under

Dr. James Zou:

underexplored, but I think it's it's really critical is trying to make these algorithms more data transparent. Mhmm. Right? And by that, I mean, like, you know, these algorithms are sort of sucking up data from all these different places. Right?

Dr. James Zou:

And as we saw a lot of potential biases or limitations or, vulnerabilities in these AI models can often be attributed to some issues in their training data. Right? For the flip side, it's also true. Right? Like, if the algorithms actually do something really well, then it's also useful to know what kinds of data actually actually contributed to the good performance of these models so that maybe we can potentially even compensate the people who are producing this kind of data.

Dr. James Zou:

You know? Right? And, like, and, like, a good example would be, like, you know, there are actually a lot of useful information that are on Wikipedia, right, or on by generated by news organizations at New York Times. And a lot of that data is actually goes into training models like GPT 4 so that these models can provide more informed content, right, generate responses to the users. And in those cases, it's also, I think, quite important to think about how do we then attribute, right, the the the good outcomes of these models back to the people who are producing the data.

Dr. James Zou:

Right? So I think teams that do this kind of data attribution whereby we take the outputs generated by these models and say, okay, so here are the training data that are actually responsible either for the good or the for the bad of those generations in a in a transparent way, I think would be go a long way towards making these algorithms more accountable and more trustworthy.

Dr. Sonia Kang:

Right. So if right from the start, AI development involved not just technical experts, but also social science experts and people from marginalized communities, and if we put measures in place to ensure transparency and accountability and data, we could have a very different AI. We'd probably be seeing products that are more responsible and ethical.

Carmina Ravanera:

Exactly. The idea here is that AI is more than just an algorithm. It's also the team that developed it, the experts they consulted, the data they used, the regulations and norms that govern it or not, and the resources it uses. So all of these steps can be done more ethically and inclusively, which would create better outcomes for the people who are using it and for everyone else too. I asked Allison about her vision for better AI.

Allison Cohen:

I think that in an ideal world, I would love to see artificial intelligence tools that really genuinely enhance the quality of people's lives, enhance their access to important goods and services, enhance even their the quality of their entertainment and the recommendations that they are given from I mean, even apps like Spotify and and Instagram. But beyond that, I wanna make sure that these tools are abiding by principles of law, regulation, and ethics, and that they are built in ways that reflect that degree of consideration. So not just caring about the end user, but also caring about the people that were involved in creating those tools. So data labelers, as I mentioned before, but also caring about the environment and, all of the other sort of knock on effects that AI development can have, that are irresponsible that that we should be thinking about. And then from a deployment standpoint, I also think, you know, people should be able to hold these tools to account.

Allison Cohen:

They should be able to have a degree of insight into how they were built, into how they're making their decisions, maybe even the ability to opt out, and and have that be very clearly explained to the user. So all these things to me are emblematic of good and ethical AI. Of course, part of this is making sure that the dataset that is scraped, does not infringe on people's rights as well. So I'm thinking about all of the creatives who have put their incredible work up online, but do expect to get paid for it or expect to get paid for future work and, end up having that expectation undercut by algorithms that can reproduce some of the creative work that they're doing. So in the future, I know that's a very big answer, but I'd like to see AI tools that are built ethically, that have a strong positive impact on people's lives that don't adversely harm communities differently, but also benefit communities equitably.

Allison Cohen:

And I think that's that's possible.

Dr. Sonia Kang:

It's really inspiring to hear the possibilities of what AI could be like. After all, it's such a powerful tool that has so much potential to bring about positive change. We just have to get it on the right track. So if someone was to say to me, we should just use AI to make this decision or build this product because it's more objective and neutral than humans are, what could I say to help bust this myth?

Carmina Ravanera:

I think the clearest way to do it is to chat about how the data used to create AI is not neutral. In fact, it's a reflection of our society. With AI, people are always saying garbage in, garbage out. In the same way, if AI is designed in a way that is purposely representative and inclusive, we'll get more representative and inclusive products and decisions out of it.

Dr. James Zou:

I think one thing, to keep in mind is that, you know, AI is always learning from data. Right? And if we give the AI algorithm some biased data and a lot of the data that are in the Internet could be biased, then the algorithms do often sort of capture or summarize those issues of biases or viewpoints in those data. Right? So we should not think of AI algorithms as being sort of like this monolithic, platonic thing.

Dr. James Zou:

Right? But it's really learning from the data that we give it. Right? So if you give it good high quality data, it's going to produce more reliable results. But if it's trained down a lot of lot of, like, messy or biased data, it can also perpetuate the issues in those datasets.

Allison Cohen:

AI is a reflection of the people building it, of the data embedded within it, and of what we are able to tolerate in terms of its outcomes. And given that degree of reflection of society and beliefs, and values, and norms, how could AI possibly be objective and neutral?

Dr. Sonia Kang:

And with that, this myth

Dr. Sonia Kang:

is

Dr. Sonia Kang:

busted. This was our last episode of the season, but make sure you subscribe. We'll be back with more episodes soon.

Carmina Ravanera:

In the meantime, happy myth busting. GATE's Busted podcast is made possible by generous support from BMO. If you liked this episode, please rate and subscribe to Busted. You can also find more interesting podcast series from the Institute For Gender and the Economy by searching GATE Audio wherever you find your podcasts. Thanks for tuning in.