Researchers from Stanford and MIT have completed a real-world assessment of generative AI in the workplace, finding that a chat assistant helped workers at a Fortune 500 company solve 13.8% more issues per hour.
The work shows the potential impact of large language models, like GPT-4, outside of classroom assessments — like passing the SATs or the bar exam — and instead in a place much more likely to mirror its real-world potential.
“Having people use it for over a year in this company, you get a much better sense of how that translates into real-world productivity,” Stanford’s Erik Brynjolfsson, a study co-author, told Bloomberg. “As far as I know, this is the first time it’s been done in a real-world setting.”
Researchers from Stanford and MIT have completed a real-world assessment of generative AI in the workplace.
The rise (and rise) of AI: Trained by the massive canon of text called the internet and capable of spitting out surprisingly human outputs, large language model AIs like GPT-4, Google’s Bard, and Anthropic’s Claude have come on like gangbusters since their debut.
Tech Twitter is littered with examples of GPT-4’s impressive capabilities, from cranking out video game code to analyzing and summarizing complex information. And while the AIs have their shortcomings — e.g., they don’t really know facts, struggle greatly with math and sometimes logic, and are vulnerable to jailbreaks — they’ve already been finding their way into the working world.
Financial services firm Stripe is an OpenAI partner, and has incorporated GPT-4 into fraud fighting and agent-helping; General Motors is planning to put ChatGPT into their cars; and startups are looking to build bespoke LLMs for fields like law. Bing, perennial also-ran to Google, has recentered their search engine around an LLM (which counts as a win by the simple fact that you’re reading about Bing right now, alone).
More practically, coding AIs — like Github’s Copilot and Replit’s Ghostwriter — are already supercharging programmers every day, helping write, annotate, and debug code. An experiment run by Github in 2022 found that coders who used Copilot to complete a task were faster, more productive, more successful, and reported higher satisfaction with their work.
But despite these encouraging results and the rapid push for AI in the workplace, there has been little academic research so far to look at its effect in real firms, in the real world — particularly chatbot based AIs.
“The emergence of generative artificial intelligence (AI) has attracted significant attention, but there have been few studies of its economic impact,” the authors wrote in their paper for the National Bureau of Economic Research.
Analyzing AI in the workplace: Researchers from Stanford’s Digital Economy Lab and MIT’s Sloan School of Management looked at the impact of a generative AI assistant on the productivity of employees at a Fortune 500 software firm’s call center over a year.
The team used the number of issues resolved per hour by call center employees, who Bloomberg reports were predominantly based in the Philippines, as their measure for productivity.
The chatbot assistant was trained using data from over 5,000 of the company’s agents, and the assistant monitors chats to supply real-time suggestions on things to say. Agents were free to use those suggestions or shrug them off.
In their assessment, agents were able to handle 13.8% more issues per hour using the AI, handling calls quicker and more effectively, as well as tackling multiple customers at once. They were also able to work 35% faster. While these gains were less dramatic than some seen in labs, the real-world study “suggests that those laboratory studies were pointing in the right direction, and that they weren’t just mirages,” Brynjolfsson told Bloomberg.
A leg up: Interestingly, it was not the best and most experienced agents who reaped most of the rewards.
“We found that workers with access to AI see fairly significant productivity gains, but most of those gains accrue to novice or less able workers,” Lindsey Raymond, an MIT doctoral candidate and co-author of the study, said.
Brynjolfsson, Raymond, and their MIT colleague Danielle Li believe the difference may come down to how much of an assist the chatbot can provide and how sorely it is needed.
Chatbot AI trained on the company’s best workers may quickly provide important knowledge and best practices to employees who don’t yet know them, helping “new workers move up the experience curve,” Raymond said.
With the AI, employees working for only two months hit performance metrics similar to those with half a year under their belts.
Less experienced workers stood to benefit more from incorporating the AI, an accelerator for getting up to speed, Raymond told CNBC. With AI, customer service agents working for only two months were hitting performance metrics similar to those with half a year under their belts.
Conversely, AI in the workplace seems to be less helpful for experienced employees; the team found no major benefits from the chatbot for highly-skilled vets.
“High-skilled workers may have less to gain from AI assistance precisely because AI recommendations capture the knowledge embodied in their own behaviors,” Brynjolfsson said in a statement.
In a knock-on effect, the team also found that the AI improved customer satisfaction, which in turn led to better interactions and improved customer retention. Despite not seeing productivity boosts, the seasoned employees actually train the AIs and should be recognized and compensated for that, Raymond told CNBC.
An unknown shift: While among the first of its kind, the study does not definitively prove the potential benefits — and drawbacks — of all AI in the workplace.
“We need far more research here,” Brynjolfsson said.
“We don’t know if the impact of AI on productivity may vary over time, and adding these tools to the office could require complementary organizational investments, skills development, and business process redesign,” and AI may impact customer and employee satisfaction, morale, and behaviors in ways the study did not find, Brynjolfsson said.
“There’s so much we don’t know.”
We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].