Guest Blogger: Latoya Peterson - "Why journalists need to understand artificial intelligence"
Our new open call for ideas, AI and the News: An Open Challenge, is looking for submissions that address the impact artificial intelligence has on the news and information ecosystem. One of the categories seeks projects that empowers journalists to better tell the story of artificial intelligence. Here, storyteller and technologist Latoya Peterson explores why all journalists need to be proficient in the technology and how it is developed. - Tim Hwang, Director
Artificial intelligence is truly a black box.
Journalists are reporting on a phenomenon that is hard to explain, even for experts in the field. And compounding matters, most of the important conversations are taking place behind closed doors. Many of the major advances in the field are proprietary, and the public is overly reliant on one- sided corporate press releases that maximize shareholder benefit and minimize risks. Meanwhile, publicly- available information is heavily academic, requiring advanced knowledge of the field to decipher anything beyond the executive summary.
Journalists need to develop a fluency in AI before it disrupts both our newsrooms and our society. We have to get better at explaining this technology that impacts nearly all aspects of our lives - from determining what movies appear in our Netflix queue to whether we qualify for loans. But to develop fluency, one needs to have a solid understanding of the infrastructure that makes artificial intelligence work - the datasets that feed the systems and where this information is coming from.
For one thing, data sets and how they are collected, used and compromised can influence the results of any system. This seems like an obvious point. But even a basic question - like “what information is in the training data for this AI model?”- can lead to a complex answer.
For instance, some of the most important datasets used for machine learning are comprised of millions of images. Usually, a programmer can answer the question of where the data came from or what library was used to generate the results. But what is the information that forms the library? Until recently, this was difficult to answer.
Training data needs to have a lot of items for it to work, so normally most libraries are collecting and compiling information from a few massive data repositories, like Google Images or Flickr. And while most places try to ensure that the data being entered is properly categorized, errors can occur at scale.
In 2015, Google had a widely publicized misstep when software engineer Jacky Alciné realized the Google Photos image recognition algorithms were tagging black people as “gorillas.” It is a horrific and racist association, but why would this happen in the first place? Most experts in the AI field knew why. There wasn’t some racist engineer causing mayhem behind the scenes. It was a data set that had been trained on more images of gorillas than African Americans.
Trickier still is how to solve this problem: the 2018 follow up piece from Wired shows that Google employed a workaround that blocked the image recognition systems from identifying gorillas, but still hadn’t fixed the core problem.
And remember, Google owns this data set, which is powered by users uploading their own photos. And that was just one example that was caught and publicized.
These kinds of issues are more common than we think, and the Google People + AI Research team created a machine learning data visualization tool called Facets. Now open source, Facets can play with the data and create a clearer visualization of the information being presented. Researchers Fernanda Viégas and Martin Wattenberg explain the genius of the system and what it can reveal during at MoMA R&D salon:
With Facets, the errors and biases in a dataset are made visible. The first few examples of bias are benign. For example, airplanes are overwhelmingly blue, which may confuse a system trying to identify red or silver flying objects as the same thing. Blank spaces, errors and places where humans and computers disagree on categorization are also easily seen. But some bias isn’t so easy to correct, and can be quite damaging. At the same salon, noted academic and researcher Kate Crawford linked the underlying bias in photography and in news - for example, why a dataset of the most labeled faces on the Web are 78 percent white men - to categorization errors in AI:
There are not simple answers in reporting on or understanding artificial intelligence and these examples just scrape the surface of the larger implications of biased systems. Many technology and data journalists have invested in understanding programming principles. I’m going to suggest that all journalists begin studying how computing and programming work on a basic level.
One does not need to want to become a programmer or even gain proficiency in a language like Python to report on AI. Just looking at how developers approach solving problems will greatly aid the understanding of how these systems are built and designed. This will then improve our framing of these issues in reporting and our understanding of how these systems will eventually impact our newsroom.
Because journalists do not understand the basics of how artificial intelligence works, we are prone to missing the larger picture or over sensationalizing our stories. Rachel Thomas, co-founder of Fast.ai, recently took the Harvard Business Review to task and shared lessons applicable to how journalists think about AI:
“The media often frames advances in AI through a lens of humans vs. machines: who is the champion at X task. This framework is both inaccurate as to how most algorithms are used, as well as a very limited way to think about AI. In all cases, algorithms have a human component, in terms of who gathers the data (and what biases they have), which design decisions are made, how they are implemented, how results are used to make decisions, the understanding various stakeholders have of correct uses and limitations of the algorithm, and so on.”
So much in understanding machine learning and artificial intelligence are about the framing. If you ask better questions and set better parameters, you receive a better result. Journalists are trained to examine the frameworks. We do this as a matter of course in our work. But for us to truly inform the public on the full potential of the AI revolution, we need to be working from a stronger knowledge base.
Latoya Peterson is a storyteller and technologist. One of Forbes Magazine's 30 Under 30 rising stars in media for 2013, she is best known for the award-winning blog Racialicious.com, focused on the intersection of race and pop culture. Previously, she was Deputy Editor over Digital Innovation at ESPN’s The Undefeated, Editor-at-large at Fusion, senior digital producer for The Stream, a social media driven news show on Al Jazeera America and a John S. Knight Journalism 2012-13 Fellow at Stanford University focusing on mobile technology and digital access. She produced a YouTube series on Girl Gamers and is currently working on projects incorporating virtual reality, augmented reality, machine learning, and artificial intelligence.