Amid a huge amount of hype around generative AI, a new study from researchers at MIT sheds light on the technology’s impact on work, finding that it increased productivity for workers assigned tasks like writing cover letters, delicate emails, and cost-benefit analyses.
The tasks in the study weren’t quite replicas of real work: They didn’t require precise factual accuracy or context about things like a company’s goals or a customer’s preferences. Still, a number of the study’s participants said the assignments were similar to things they’d written in their real jobs — and the benefits were substantial. Access to the assistive chatbot ChatGPT decreased the time it took workers to complete the tasks by 40 percent, and output quality, as measured by independent evaluators, rose by 18 percent.
The researchers hope the study, which appears today in open-access form in the journal Science, helps people understand the impact that AI tools like ChatGPT can have on the workforce.
“What we can say for sure is generative AI is going to have a big effect on white collar work,” says Shakked Noy, a PhD student in MIT’s Department of Economics, who co-authored the paper with fellow PhD student Whitney Zhang ’21. “I think what our study shows is that this kind of technology has important applications in white collar work. It’s a useful technology. But it’s still too early to tell if it will be good or bad, or how exactly it’s going to cause society to adjust.”
Simulating work for chatbots
For centuries, people have worried that new technological advancements would lead to mass automation and job loss. But new technologies also create new jobs, and when they increase worker productivity, they can have a net positive effect on the economy.
“Productivity is front of mind for economists when thinking of new technological developments,” Noy says. “The classical view in economics is that the most important thing that technological advancement does is raise productivity, in the sense of letting us produce economic output more efficiently.”
To study generative AI’s effect on worker productivity, the researchers gave 453 college-educated marketers, grant writers, consultants, data analysts, human resource professionals, and managers two writing tasks specific to their occupation. The 20- to 30-minute tasks included writing cover letters for grant applications, emails about organizational restructuring, and plans for analyses helping a company decide which customers to send push notifications to based on given customer data. Experienced professionals in the same occupations as each participant evaluated each submission as if they were encountering it in a work setting. Evaluators did not know which submissions were created with the help of ChatGPT.
Half of participants were given access to the chatbot ChatGPT-3.5, developed by the company OpenAI, for the second assignment. Those users finished tasks 11 minutes faster than the control group, while their average quality evaluations increased by 18 percent.
The data also showed that performance inequality between workers decreased, meaning workers who received a lower grade in the first task benefitted more from using ChatGPT for the second task.
The researchers say the tasks were broadly representative of assignments such professionals see in their real jobs, but they noted a number of limitations. Because they were using anonymous participants, the researchers couldn’t require contextual knowledge about a specific company or customer. They also had to give explicit instructions for each assignment, whereas real-world tasks may be more open-ended. Additionally, the researchers didn’t think it was feasible to hire fact-checkers to evaluate the accuracy of the outputs. Accuracy is a major problem for today’s generative AI technologies.
The researchers said those limitations could lessen ChatGPT’s productivity-boosting potential in the real world. Still, they believe the results show the technology’s promise — an idea supported by another of the study’s findings: Workers exposed to ChatGPT during the experiment were twice as likely to report using it in their real job two weeks after the experiment.
“The experiment demonstrates that it does bring significant speed benefits, even if those speed benefits are lesser in the real world because you need to spend time fact-checking and writing the prompts,” Noy says.
Taking the macro view
The study offered a close-up look at the impact that tools like ChatGPT can have on certain writing tasks. But extrapolating that impact out to understand generative AI’s effect on the economy is more difficult. That’s what the researchers hope to work on next.
“There are so many other factors that are going to affect wages, employment, and shifts across sectors that would require pieces of evidence that aren’t in our paper,” Zhang says. “But the magnitude of time saved and quality increases are very large in our paper, so it does seem like this is pretty revolutionary, at least for certain types of work.”
Both researchers agree that, even if it’s accepted that ChatGPT will increase many workers’ productivity, much work remains to be done to figure out how society should respond to generative AI’s proliferation.
“The policy needed to adjust to these technologies can be very different depending on what future research finds,” Zhang says. “If we think this will boost wages for lower-paid workers, that’s a very different implication than if it’s going to increase wage inequality by boosting the wages of already high earners. I think there’s a lot of downstream economic and political effects that are important to pin down.”
The study was supported by an Emergent Ventures grant, the Mercatus Center, George Mason University, a George and Obie Shultz Fund grant, the MIT Department of Economics, and a National Science Foundation Graduate Research Fellowship Grant.
This article was published with permission of MIT News, where it was originally published.