Can Tech Companies Learn to Love Cheaper AI Models?

The AI boom has been built on the idea that bigger models are more powerful. But now, mounting costs have pressured users to give smaller and cheaper models a second look. Brian Armstrong, co-founder of Coinbase, has predicted that 80% of workloads will run on 99% cheaper models within 12 to 18 months. Only 20% of workloads will use the latest generation models where maximum intelligence is important. This prediction, if it comes true, will be a huge shift for the AI industry. Most AI companies have competed on quality up to now. They have defaulted to the most advanced model for everything. But if cheaper models can handle the same jobs without affecting quality, it will mean a massive change in the economics of AI. Much of the savings would come out of the pockets of big labs like OpenAI and Anthropic. A recent test by the legal AI tool Harvey showed promising results. Harvey worked with Fireworks AI and combined Claude Opus with a cheaper model. They used the cheaper model for most tasks and switched to Opus for the most intensive ones. The result was a reduction in inference costs by 3x without any loss in quality. Gabe Pereyra from Harvey said that quality comes first, but the definition of quality is evolving. Now it means using the best model that gets the right answer most efficiently. The real divide is not between proprietary and open models, but between large models and small ones. Companies can save money by switching from GPT-5.5 to DeepSeek V4 Flash, or by using GPT-5.4-mini. There is an active price war between in-house inference from big labs and independently served open-weight models. For the bigger question of small versus large, it does not matter which kind of small model wins out. This might seem obvious, but it runs counter to the scaling-first approach that has dominated the industry. Investors have subsidized prices, so clients had no reason to choose anything but the most advanced option. Now, with token prices rising and subsidies slowing down, users are facing cost pressure for the first time. It is not clear whether this pressure will drive enterprise users to smaller models. They could also economize by making fewer calls or using less context. But if most deployments can run just as well on a smaller model, it could put a damper on the growing demand for inference.
Take a position. Out loud, if you can.
Four ways to start. Pick one and try saying it before you scroll on.
Tip · Record yourself, use in a notebook, or practice with a language partner.
What has pressured users to look at cheaper models?
Present perfect for past actions with present relevance
We use present perfect to talk about past actions that are still relevant now. It is formed with have/has + past participle.
“The AI boom has been built on the idea that bigger models are more powerful.”
What to know · B1
Try saying this aloud
Scenario: Discussing AI trends with a colleague
- 01“Costs have been mounting.”
- 02“We have seen a shift.”
- 03“Quality has evolved.”
Register tip · neutral
🔑Key Phrases
This phrasal expression means to reconsider something. It is common in business contexts.
We should give the old proposal a second look.
This phrase introduces a contrast. It is useful for academic and professional writing.
His opinion runs counter to the majority.
This phrase describes a common business situation. It uses present continuous to show an ongoing state.
Many startups are facing cost pressure now.
🎙️ Article Audio — Kokoro TTS
Can Tech Companies Learn to Love Cheaper AI Models?
Adapted from TechCrunch · Read the original. LinguaPress rewrites the facts as original graded-reader text for language learners.
Advertisement


