Training vs Inference: Where AI Energy Goes

6 min read

Understanding the difference between AI training and inference energy costs, why inference now dominates total consumption, and what the Jevons paradox means for AI energy use.

Two phases, one model

Every AI model has two distinct phases of energy consumption: training and inference. They differ fundamentally in their nature, their scale, and their trajectory — and understanding the difference is essential for making sense of AI's environmental impact.

Training is the process of creating the model. It involves feeding enormous datasets through a neural network and iteratively adjusting billions of parameters until the model learns to produce useful outputs. Training happens once (or a few times, as models are updated). It is a massive, concentrated burst of computation.

Inference is the process of using the model. Every time you send a prompt to ChatGPT, every time Google generates an AI Overview, every time GitHub Copilot suggests a line of code — that is inference. It happens billions of times per day, across millions of users, and it never stops.

Training: the one-time cost

Training a frontier AI model is one of the most energy-intensive computational tasks ever undertaken. It requires thousands of GPUs running continuously for weeks or months, consuming as much electricity as a small town.

~50 GWhEstimated training energy for GPT-4, equivalent to powering roughly 4,600 US homes for a year

~39,000 MWhPublished training energy for LLaMA 3.1 405B, using 30.8 million GPU-hours on NVIDIA H100s

~2,600 MWhPublished training energy for LLaMA 3.1 70B, demonstrating how energy scales with model size

These numbers are staggering, but they are also finite. Once training is complete, the cost is amortised across every query the model ever serves. If GPT-4 serves billions of queries over its lifetime, the per-query training cost becomes tiny.

This amortisation is why training cost, while headline-grabbing, is not the primary driver of AI's ongoing energy footprint. It is a large upfront investment that shrinks in relevance with every passing day of deployment.

Inference: the ongoing cost

Inference is where the energy truly accumulates. Each individual query is cheap — a fraction of a watt-hour for a text model. But the volume is extraordinary.

OpenAI has disclosed that ChatGPT receives hundreds of millions of queries daily. Google processes over 8.5 billion searches per day, and is increasingly augmenting them with AI-generated overviews that require inference. Meta runs AI inference across Facebook, Instagram, WhatsApp, and its AI assistant, serving billions of users.

A widely cited analysis by Epoch AI estimates that inference accounts for approximately 85% of total AI energy consumption, with training making up the remaining 15%. This ratio is expected to shift even further toward inference as models become more widely deployed.

The per-query cost may be small, but summed across billions of daily queries and multiplied across years of deployment, inference dwarfs training as the dominant energy cost.

The Jevons paradox: cheaper doesn't mean less

In 1865, the economist William Stanley Jevons observed that as steam engines became more fuel-efficient, total coal consumption increased rather than decreased. More efficient engines made steam power cheaper, which expanded its adoption into new applications. The savings per unit were overwhelmed by the increase in total units.

AI is following the same pattern. As inference becomes cheaper and more efficient — through better hardware, model distillation, and MoE architectures — it becomes economically viable to deploy AI in more contexts. Features that were too expensive a year ago become standard. New applications emerge. Total AI compute grows even as per-query efficiency improves.

Google's Gemini is a clear example: the company achieved a 33x reduction in energy per query between 2024 and 2025. But over the same period, Google rolled out AI Overviews to billions of search queries that previously required no AI inference at all. The net effect on total energy consumption is likely an increase despite the dramatic per-query improvement.

This is not an argument against efficiency — efficiency is unambiguously good. But it means that efficiency alone cannot cap total AI energy growth. Demand growth must also be considered.

Why Know Your Compute focuses on inference

Our calculator and comparison tools measure inference energy — the per-query cost — because that is the metric individuals can actually influence. You cannot change how much energy was spent training GPT-4, but you can choose whether to use GPT-4 or GPT-4o Mini for a given task.

When you use our calculator, you are seeing the marginal energy, carbon, and water cost of your usage pattern. Choosing a more efficient model, writing more focused prompts, and avoiding unnecessary queries are all decisions within your control that directly reduce inference energy consumption.

Training costs are important for understanding the full lifecycle impact of AI systems, and we discuss them here and in our methodology. But for practical, actionable decision-making — which is what this tool is for — inference is the right metric.

Continue reading

How AI Uses Electricity

A detailed guide to how AI models consume electricity, from GPU computation to data centre overhead. Understand why different models use different amounts of energy.

AI's Water Footprint Explained

How AI data centres consume water for cooling, the difference between direct and indirect water use, and what's being done to reduce the impact.

How to Choose an Energy-Efficient AI Model

A practical guide to selecting AI models that minimise energy, carbon, and water consumption. Covers the efficiency range, category impacts, architecture differences, and the right-sizing principle.

Back to Learn overview