interview  | 

The invisible workforce behind artificial intelligence

Interview with Mary Gray, Principal Researcher at Microsoft Research and co-author of Ghost Work

Tags: 'Artificial intelligence' 'Big data' 'Ghost work' 'Mary Gray'


Reading Time: 6 minutes

Mary L. Gray is a Principal Researcher at Microsoft Research and Faculty Affiliate at Harvard University’s Berkman Klein Center for Internet and Society. She maintains a faculty position in the School of Informatics, Computing, and Engineering with affiliations in Anthropology, Gender Studies and the Media School at Indiana University.

You coined the expression ‘ghost work’ in your latest book, referring to the hidden human workforce that works alongside AI to power websites and apps. Can you give us your definition of ghost work?

So my coauthor Siddharth Suri, and I coined the term ghost work to get at really two layers of this new world of work. One is a superficial definition. It’s work that can be hidden from a consumer that otherwise thinks automation is doing the job and the reason for hiding it isn’t so suspicious. It’s literally so that the experience that a consumer might have searching an item on the web, for example, will go quickly. But there’s another layer of meaning behind the term ghost work.

It’s meant to help us call out work conditions where the value of the person providing that moment of service is literally erased.

What motivated you to research this topic?

I got into this research by accident. When I moved to Microsoft research, I wanted to understand how computer scientists and engineers built artificial intelligence. I didn’t really understand how it worked. When I started asking around, they would often tell me, “Oh, we also have to hire people to label certain images or to clean up training data.” When I asked them, who are the people that you hire to do this work? I would get a range of responses from, “I don’t really know,” and the conversation would end there, to having some engineers say, “I’m afraid to find out.” Luckily then met Suri, When I asked him, “who are the people who do this work.” He was the first person I met who said, “I don’t know, but I’d like to find out.”

From labelling data to content moderation, what are the most common ghost work tasks?

The most common ghost work tasks are often handling data entry. People might not be aware that most artificial intelligence starts with gathering data produced by people and

it turns out it actually takes quite a bit of human effort to be able to create what’s called training data for artificial intelligence.

Two years ago, if I had said content moderation as a job, most people wouldn’t have understood what that meant. But there are literally thousands upon thousands of people who review content that we create online and make sure that it is material that is appropriate for that platform.

What are the advantages of ghost work for the worker and for the contractor? What are the disadvantages?

Businesses have the advantage of being able to quickly respond to whatever the request might be, and for the workers, the main advantage is that they can access work anytime of day and they can pick projects they can organize doing that work around their schedules and they can choose the people that they collaborate with. They don’t have to go into an office that can avoid commutes. The downside for the workers is that this isn’t work that is legally regulated or recognized as a socially valuable job and so they don’t have the basic protections that come with doing a nine to five office job.  We don’t, for the most part, connect benefits for work to contract work. That’s the biggest problem right now.

What motivates people to enter the ghost work market?

The bigger picture here is that you don’t have a steady supply of demand for work. There are specific cases though. We met women for example, who didn’t have opportunities to participate in formal employment, in India it might be because it was culturally not acceptable for them to go to the office and they still wanted to participate in the economic activity that can contribute to their households. In other cases, it was young people who were in college but it wasn’t that they found the work so flexible and that’s what drew them to it, is that the other opportunities that would sustain them economically just aren’t there. 

What is it like doing ghost work?

One of the biggest discoveries that now seems so obvious is just how much workers find each other and collaborate. It just reminds you of just how social we are when we work and that we derive so much meaning out of our work.  

The biggest challenge for this workforce is that unlike in the past, they don’t have a professional identity do unite them.

They don’t have have associations and unions that are going to be readily available to advocate for them. 

How does ghost work relate to the gig economy?

I’d argue that ghost work is a way of rethinking what we mean by gig economies and platform work. In most cases when people talk about the gig economy, it loses a particular political charge. It becomes this neutral thing that people choose to do or choose not to do. ‘Contract work’ is not a sexy term, but actually in most cases gig work is doing a project or a task on contract. 

So what does it look like in future?

There’s no reason this can’t become good work other than our need to collectively reconsider.  It’s not that the market has somehow affectively priced them because they’re worthless, it’s that society hasn’t figured out how to value them often because of who they are, not just what they do. The irony here is that we’re not seeing our own role as consumers, as the hiring agents of this workforce.

Do you think this service based economy and contact system is here to stay?

It’s only been 200 to 300 years since we started pricing people’s labor as though it was a product. So I would say we’re at the very beginning of recognizing how inhumane and antisocial it is to suggest that our time is something that can be boiled down to a price. And that somehow you can actually imagine that someone’s time is worth more than someone else’s.

What do you mean by the paradox of automation’s last mile?

The paradox of automation’s last mile is that as we strive to automate more so that we have less to work out on our own, we’re also going to open up all of these other places where we’re going to demand more prediction, more automation. We literally just keep moving the goalposts.

Yes, in some cases we’ll be able to have software do what a person can do. I don’t think any of us want to go back to the day where we have to do by hand calculation.

In your book you give a number of recommendations for the future.

Everything is grounded in the empirical reality of what workers have already tried and figured out. What workers really need is municipal wifi and office space. We can turn every library that’s underused today into coworking space that workers could use so that they’re not in their own environments or home offices that aren’t really meeting their needs and physically are a public health disaster waiting happening.

So how can we ensure this happens?

The problem isn’t that workers haven’t figured out how to make this meaningful. The challenge is that we haven’t figured out how to recognize how meaningful their work is to us.  And it turns out that everything from recognizing someone’s facial expression and understanding do they need help, takes a great amount of creativity and complex communication.

The people building these platforms literally conceived of what they were building as software. Nothing more complicated than software matching people with requests for help and people who could help them. Nowadays, we all realize how much technology companies are effectively building social systems. They’re building job boards, they’re building healthcare systems, they’re building all sorts of systems that are fundamentally as social as they are technical. 

We keep giving away our data without a second thought towards the consequences. Who is responsible for ensuring Big Data is collected, aggregated and used fairly?

If we’re not thinking about the inherent cultural bias as we’re looking at the data sets, we’re missing an opportunity to see how big data, the engine behind providing services and platform economies, are also always, by definition, skewed. When we’re looking at big data and the value of big data, the first question we should be asking ourselves is valuable to whom? Things will radically change if consumers see they have every right to say, you don’t get my data without my permission. 

What are your thoughts on finding a balance between data privacy and the open nature of the internet?

So the biggest challenge is that once any entity collects the data, it’s very difficult to detach it from who produced it. When we’re collecting data, we’re learning about people’s activities and behaviors. We’re using that to model artificial intelligence.

Literally every single day we are without question giving away our right to say “no, you don’t get to have that”. 

So besides asking for permission, what else can be done to take back control of our data?

All of this is getting us to a point of conversation literally with engineers and computer scientists to say, we want you to have a different set of responsibilities. They genuinely just see data, and our job is to help them see people and human interactions so that they don’t feel comfortable just taking what they can get.