You can run this text-generating AI on your own devices, no giant servers needed

Stanford researchers have created a model comparable to GPT-3.5 — for a fraction of the computing power and price.

March 15, 2023

While OpenAI’s newest large language model, GPT-4, is proving capable at all manner of interesting and creative tasks, it does have a problem that could potentially hold back its widespread use: it needs serious horsepower.

Or, as OpenAI vice president of product Peter Welinder put it, GPT-4 is “very GPU hungry.”

That’s no surprise. Building a large, complex model like GPT-4 ran up against the edges of computing power, leading OpenAI to work with Microsoft’s Azure high-performance computing division to create the tools needed.

“One of the things we had learned from research is that the larger the model, the more data you have and the longer you can train, the better the accuracy of the model is,” Nidhi Chappell, head of product for Azure and AI at Microsoft, said.

“So, there was definitely a strong push to get bigger models trained for a longer period of time, which means not only do you need to have the biggest infrastructure, you have to be able to run it reliably for a long period of time.”

While GPT-4 can be used for all manner of interesting tasks, it does have a problem that could potentially hold back its widespread use: it requires serious horsepower.

A model for the masses: What this means, though, is that to truly run a GPT model requires a serious amount of computing power. When you’re using one of the GPT engines, you’re not actually running the model itself on your computer; you’re accessing it remotely via the internet. OpenAI has not made the entire model available — what are called closed-source models.

“These models are also BIG,” entrepreneur Simon Willison wrote in a blog post. “Even if you could obtain the GPT-3 model you would not be able to run it on commodity hardware” because it requires speciality components that cost thousands of dollars.

But a team of Stanford researchers have managed to create a large language model AI with performance comparable to OpenAI’s text-davinci-003 — one of the models in GPT-3.5 — which can be run on commercial hardware.

The AI is called “Alpaca 7B,” so named because it is a fine-tuned version of Meta’s LLaMA 7B model. Despite having capabilities close to text-davinci-003, Alpaca 7B is “surprisingly small and easy/cheap to reproduce,” the researchers wrote.

They estimate that Alpaca 7B can be run on hardware costing less than $600 — far, far cheaper than the massive computing power OpenAI uses to run ChatGPT and GPT-4.

Alpaca 7B has also been released for research use only, a departure from the walled-off OpenAI models, which “limits research,” researcher Tatsunori Hashimoto tweeted.

Instruction-following models are now ubiquitous, but API-only access limits research.
Today, we’re releasing info on Alpaca (solely for research use), a small but capable 7B model based on LLaMA that often behaves like OpenAI’s text-davinci-003.

Demo: https://t.co/H8bpqzMMz5 pic.twitter.com/0L30VOpxOB
— Tatsunori Hashimoto (@tatsu_hashimoto) March 13, 2023

Making Alpaca: To create their model, the Stanford team used text-davinci-003 to fine-tune a LLaMA 7B model, making it more capable of responding to prompts naturally than LLaMA is in its raw form.

What they ended up with was able to generate outputs that were largely on par with text-davinci-003 and regularly better than GPT-3 — all for a fraction of the computing power and price. The fine-tuning process, which used 52,000 examples and took three hours, was accomplished using computing power that would cost less than $100 with most cloud computing providers.

Alpaca 7B does have some important limitations, the authors noted; it is designed for academic research only, with commercial use prohibited. The team gave three reasons for the decision: because the terms of use and licensing agreements of both LLaMA and OpenAI strictly prohibit commercial development, and because Alpaca 7B does not have any guardrails.

The results: The student authors evaluated Alpaca 7B using inputs from the training data that covered a variety of tasks users were likely to want, including writing emails and productivity tools. The team then pitted it against text-davinci-003.

“We performed a blind pairwise comparison between text-davinci-003 and Alpaca 7B, and we found that these two models have very similar performance: Alpaca wins 90 versus 89 comparisons against text-davinci-003,” the authors wrote.

They were “quite surprised by this result given the small model size and the modest amount of instruction following data.”

The team created an AI able to generate outputs that were largely on par with OpenAI’s text-davinci-003 and regularly better than GPT-3 — all for a fraction of the computing power and price.

That said, the team acknowledged that Alpaca 7B has some of the problems other language models have been running into, including toxicity and stereotypical replies, and “hallucinations,” or producing untruthful answers with no caveats.

The team noted that hallucinations seem of particular concern. For example, when asked what the capital of Tanzania is, Alpaca answered “Dar es Salaam.” While that city is indeed in Tanzania, Dar es Salaam is the most populous city in the country, not its capital. Alpaca can also be used to create well-written misinformation.

To help curtail these concerns, the researchers encourage people to flag problems when using the demo version. The team has also installed two guardrails on the demo: a filter developed by OpenAI and a watermark to identify Alpaca 7B outputs.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].