Despite the many things that advanced chatbots like OpenAI’s GPT-4, Google’s Bard, and Anthropic’s Claude can do, these AIs do have a substantial Achilles heel: they’re pretty bad at math.
This makes sense when you consider that they are large language models (LLMs). Trained on the vast corpus of the internet, they essentially operate entirely off of text they have “read.” If you ask one to add two different numbers, it does not truly add them, like a calculator: it predicts the answer based on text it has been trained on, and replies with that.
“Claude” actually explained it best, when Semafor’s Gina Chua prompted it to do some math.
“I am not actually able to do mathematical calculations,” the chatbot responded to Chua. “While I can have conversations about math and numbers, I do not have a built in calculator … I simply treated the question as another language input, and responded with the sum I was trained to give for that specific set of numbers.”
But one new application may provide LLMs with true mathematical powers: its promised integration into Excel, as part of Microsoft’s plans to launch AI Copilot that works with its 365 apps, including Excel, PowerPoint, and Word.
Having access to Excel’s tools could allow for handling numbers and logic.
“I have been working on this problem, and I’d say math/logic is one of the biggest weaknesses/limitations of LLMs,” Nazneed Rajani, robustness research lead at AI company Hugging Face, tells Freethink via email.
LLMs are currently not reliable when counting, Rajani says. Even a simple prompt like “write a sentence about x that is y words long” is almost always incorrect; the LLM just doesn’t respond with the correct number of words.
Telling the LLM to “think” about it “step-by-step” can help it to avoid or correct mistakes, but “I’d not trust the calculations without validating them myself,” Rajani says.
But having access to Excel could help LLMs better understand data beyond words and images.
“Excel perhaps adds a lot more structure to the data, and having a model fine-tuned on this structural data would definitely boost the performance of an LLM on Excel-specific tasks,” Rajani says.
As Chua points out, that would at least mean more than an LLM that could perform basic arithmetic correctly. But Excel is essentially a database program that can handle not only numbers but also text, dates, and much else.
If LLMs can successfully incorporate Excel or means to access math and logic capabilities, they could be prompted to do things like create an accurate budget and modify it for various scenarios, all with conversational prompts, or search for patterns in data merely by asking natural human questions.
However, Microsoft has yet to make Copilot AI available to the public, so just how close we are to an LLM that can crunch the numbers is still a bit of an unknown variable.
We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].