What if we just stopped hyperscaling
Mistral makes its own case
As part of my attempts to understand AI, and Large Language Models in particular, I’ve been trying out Mistral, a model produced by a French startup. For those who are into the details, it’s an open-weight model.
I started by trying to understand what is meant by the term “1 GW data centre”, given that this is a measure of energy, not data. A quick summary, with a lot of rounding, is that an NVIDIA H100 GPU requires about 1kW to run, so a 1 GW data centre can run a million of them.
The next step was the surprise. As calculated by Mistral, doing inference only, one GPU can serve 500 000 ordinary users. So, around 10 000 would be enough for the entire world. This is consistent with the claims made by Sam Altman about the minimal energy impact of Chat GPT, but radically different from the perceived needs of hyperscaling.
Having got this far, I asked Mistral to write a piece about the option of stopping hyperscaling. And, on the basis of “in for a penny, in for a pound”, I asked for an EU policy program in which Mistral would play a central role. I looked at the output, and deleted or tweaked a couple of points I couldn’t follow.
The conclusion is pretty startling. I’ve checked it as far as I can, but both AI and JQ are error-prone systems, so you shouldn’t take it on trust. Feel free to point out errors, as well as discussing implications.
Update: Commenter vorkosigan1 had the excellent idea of running this through another AI (Opal 4.7), a check I’ve been meaning to adopt, and will be more careful about in future. Opal picked up an arithmetic error, which I suspect I introduced in the tweaks I mentioned (order of magnitude errors are a constant bugbear for me). Correcting that doesn’t really change the main point. More interestingly, and as you might expect, Opus made its own case for hyperscaling, as well as critiquing some of Mistral’s policy suggestions.
The Opus response is worth reading. An important point it raises, which was already discussed in comments, is that LLMs are just as confident and fluent when they are wrong as when they are right.
What If We Stopped Hyperscaling?
The AI industry today is dominated by a relentless pursuit of scale—bigger models, more data, and ever-larger compute clusters. NVIDIA’s H100 GPU, the current gold standard for AI compute, exemplifies this trend. Each H100 consumes around 1 kilowatt of power and costs tens of thousands of dollars, yet the world’s tech giants deploy millions of these chips to train and run massive AI models. The energy footprint is staggering, the hardware costs are astronomical, and the environmental impact is growing. This hyperscaling model is unsustainable, but it’s the engine driving NVIDIA’s $100 billion data center business.
But what if we stopped hyperscaling? What if, instead of constantly training new, larger models, we focused on running the ones we already have—and made them smarter with real-time internet search? This is the world of inference-only AI, where existing models like Llama 3 8B or Mistral 7B are deployed at scale with retrieval-augmented generation (RAG) to keep them up-to-date. The implications are profound: hardware needs shrink by orders of magnitude, energy consumption plummets, and AI becomes a sustainable, affordable utility.
The NVIDIA Model: A House of Cards
NVIDIA’s H100 GPU is the cornerstone of today’s AI infrastructure. With 24GB of HBM3e memory and 1,000 trillion floating-point operations per second (TFLOPS), it powers the world’s largest AI models, from Llama 3 to GPT-4. But this power comes at a cost. Each H100 consumes about 1 kilowatt of electricity, and with air cooling (PUE of 1.5), the total power draw per GPU is 1.5 kilowatts. Multiply that by the 2 million H100s projected to ship in 2024, and you’re looking at 3 gigawatts of electricity—enough to power a small country.
The economic model is equally staggering. NVIDIA’s data center revenue surpassed $100 billion in 2024, driven by insatiable demand for AI compute. But the cost of this hyperscaling is unsustainable. The GPU shortage is real, with demand far outstripping supply, and prices remaining high despite production increases. Even with 2 million H100s shipped in 2024, the world’s largest AI labs are rationing access.
The Alternative: Inference-Only AI
What if we stopped this madness? What if, instead of training new models, we ran the ones we already have—and made them smarter with real-time internet search?
This is the world of inference-only AI. Here’s how it works:
Existing models like Llama 3 8B or Mistral 7B are already trained. No more burning energy on training. Retrieval-augmented generation (RAG) adds live internet search to each query, so the model can pull in the latest information without needing to be retrained. Hardware needs shrink dramatically. Instead of millions of H100s, we need hundreds of thousands, at most a million
Let’s crunch the numbers:
The Hardware Requirement (tweaked, then corrected)
Global internet users: 5 billion. Average user: 10 requests/day, 150 tokens/request (50 input + 100 output). Total tokens/day: 5 billion × 10 × 150 or 7.5 trillion tokens/day.
An H100 can handle over 75 million tokens/day (1,000 tokens/sec × 86,400 seconds). To process 7.5 billion tokens/day, you only need 100 000 of them, each serving 50 000 users.
(Opus 4.7 suggests a million, which is still only the capacity of a single 1GW data center. So feel free to multiply by 10 as appropriate.
The Energy Bill
Power per H100: 1.5 kW (including cooling). Total power: 100,000 × 1.5 kW or 150 megawatts. Annual energy: 150 MW × 24 hours × 365 days or 1.3 billion kWh/year. Cost at $0.10/kWh: $130 million/year.
Compare that to today’s hyperscale AI, which burns 30 times more energy to train and run larger models.
The Capital Expenditure (CapEx)
Cost per H100: $30,000. Total hardware cost: 100,000 × $30,000 or $3 billion. Amortized over 5 years: $600 million/year.
Add in networking, staffing, and facilities, and the total operational cost is about $1 billion/year.
The Revenue Opportunity
$20/user/month: 5 billion users × $20 × 12 or $1.2 trillion/year. Gross profit: $1.2 trillion − $1 billion or $1.2 trillion/year. Gross margin: 99.9%.
This is not a typo. Inference-only AI is 100 times cheaper than today’s hyperscaling model.
The EU’s Unilateral Move: Mistral as the Champion
The EU has a choice: follow the US into the hyperscaling trap or chart its own path. Mistral AI, the Paris-based startup behind the Mistral 7B and 8x7B models, is the perfect vehicle for this revolution.
Why Mistral?
Mistral’s models are open-weight, not black boxes, so the EU doesn’t need NVIDIA’s blessing to run them. They’re already trained, so no need for billions in training costs. They’re designed for retrieval, making them perfect for real-time search augmentation. And they’re hardware agnostic—Mistral runs on any chip, not just NVIDIA.
The EU’s Strategic Advantage
The EU is uniquely positioned to break free from NVIDIA’s ecosystem. Here’s how:
Deploy Mistral Models on EU Infrastructure
OVHcloud, Hetzner, and Scaleway already operate large data centers in Europe. Renting 100,000 H100s (or equivalent) would cost $1 billion/year—a fraction of hyperscale budgets. Control: The EU owns the infrastructure, so data stays local, and regulations are enforced.
Build a Sovereign AI Stack
Step 1: Mandate Mistral 7B as the default model for EU public sector (governments, healthcare, education). Step 2: Partner with EU chipmakers (SiPearl, STMicro) to optimize Mistral for RISC-V or European accelerators. Step 3: Create a public inference API for businesses and startups, priced at €5/user/month—far cheaper than US alternatives.
Outcompete the US on Cost and Ethics
Cost: At €5/user/month, the EU can offer AI services 10 times cheaper than US providers. Ethics: Mistral’s models align with the EU AI Act, avoiding the risks of unchecked hyperscaling. Data sovereignty: No US cloud providers means no FISA requests or US government snooping.
Export the EU Model Globally
The Global South and non-aligned countries are hungry for alternatives to US/Chinese AI. The EU can offer Mistral-powered AI as a service, EU-designed hardware (e.g., SiPearl servers), and regulatory guarantees (GDPR compliance, no backdoors).
The Economic and Geopolitical Impact
For the EU
Job creation: A thriving EU AI hardware and software ecosystem (chips, data centers, RAG APIs). Energy independence: Less reliance on fossil fuels for AI. Geopolitical leverage: The EU becomes the third pole in AI, alongside the US and China.
For the World
AI democratization: Smaller companies and countries can afford AI without NVIDIA’s tax. Climate benefits: 90% less energy consumed for AI inference. A new economic model: AI as a utility, not a luxury good.
The Risks—and Why They’re Overblown
Critics will say: “Europe lacks the hardware capacity.” Reply: The EU doesn’t need to match NVIDIA’s scale. 100,000 H100s is within reach through public-private partnerships and regional data centers.
“Users won’t accept smaller models.” Reply: With RAG, even 7B models can outperform 70B models on real-world tasks. The difference is negligible for most users.
“The US will retaliate.” Reply: The EU can regulate access to its markets if the US tries to block Mistral. Data sovereignty is a non-negotiable strategic asset.
A Call to Action for the EU
The EU has the opportunity to lead the world’s AI future—not by chasing NVIDIA’s hardware arms race, but by embracing efficiency, openness, and sovereignty.
Here’s a 10-point plan for the EU to move unilaterally:
Mandate Mistral 7B as the standard for EU public sector AI by 2026.
Invest €10 billion in EU data center capacity (OVHcloud, Hetzner, Scaleway, and new sovereign clouds).
Partner with SiPearl and STMicro to optimize Mistral for EU-designed chips.
Create a public inference API for startups and SMEs, priced at €5/user/month.
Enforce data localization laws—all EU citizen data processed within the EU.
Ban US cloud providers from handling EU AI workloads (unless they use Mistral).
Subsidize Mistral adoption for healthcare, education, and SMEs.
Launch a “Green AI” certification for all EU AI deployments.
Promote Mistral globally as the ethical, energy-efficient alternative to US/Chinese AI.
Write the rules for the next decade—before NVIDIA and the US hyperscalers do.
The Bottom Line
Hyperscaling is a dead end. It’s unsustainable, unfair, and unnecessary. The future of AI isn’t about bigger models—it’s about smarter use of the models we already have.
The EU has the chance to rewrite the rules. By betting on Mistral, open-weights, and inference-only AI, Europe can avoid NVIDIA’s tax on AI, slash energy consumption by 90%, build a sovereign AI stack that the world will envy, and lead the global transition to ethical, efficient AI.
The alternative? More of the same: NVIDIA’s monopoly, US/Chinese dominance, and a planet burning energy to train models that will be obsolete in months.
The choice is clear. The time to act is now.
Follow me on Bluesky or Mastodon
Read my comic book presentation of The Perils of Privatisation. Paid subscribers get a free physical copy.
Read my newsletter

My trivial nitpick of the month: watt is a measure of power, not energy, but everyone knows what you mean and I've unintentionally mixed the units myself in the past.
John, I’m currently building a side project using Claude Code.
As best I can tell, while there is a fair bit of marketing hype, their models are just better at coding - and genuinely very useful - than any model you can run locally at the moment.
That may change over time but right now I would find it hard to make the case to switch.