How to Choose an Energy-Efficient AI Model

7 min read

A practical guide to selecting AI models that minimise energy, carbon, and water consumption. Covers the efficiency range, category impacts, architecture differences, and the right-sizing principle.

The 160,000x efficiency gap

Not all AI models are equally demanding. Across the 152 models in our database, the most energy-intensive model consumes roughly 160,000x more energy per query than the most efficient one. This is not a small difference — it is the difference between a night light and an industrial heater.

This gap means your choice of model has a far greater impact on your environmental footprint than almost any other factor: more than your region, more than how many queries you send, more than the time of day. Choosing wisely is the single most effective action you can take.

Category matters most

The biggest determinant of energy consumption is the category of task. Text generation is fast and efficient. Image generation requires iterative diffusion processes. Video generation is orders of magnitude more demanding. Here are the average energy costs across categories in our database:

Category	Models	Average energy	Typical range
Text / Chat	94	3.2 Wh	0.06 - 30 Wh
Code	12	1.8 Wh	0.3 - 5 Wh
Image	16	4.1 Wh	0.5 - 80 Wh
Video	16	414.4 Wh	150 - 2,400 Wh

The takeaway is straightforward: if your task can be accomplished with a text model, use a text model. Generating an image when a text description would suffice costs 10x to 1,000x more energy.

Within categories: what makes a model efficient

Even within the same category, models vary widely. Among text models alone, there is a 500x gap between the most and least efficient. Three factors drive most of this variation.

Model size (parameter count)

Larger models require more GPUs and more computation per token. A 1B-parameter model fits on a single GPU. A 405B-parameter model needs 8 or more. Energy roughly correlates with parameter count, though the relationship is not perfectly linear due to differences in architecture and deployment efficiency.

Architecture (MoE vs dense)

Mixture-of-Experts models activate only a subset of their parameters per query, achieving the capability of a large model at the cost of a smaller one. Gemini, DeepSeek-V3, and Mixtral all use MoE. Dense models like the LLaMA series activate every parameter. For a given capability level, MoE models are typically more energy-efficient.

Provider infrastructure

The same model can have different energy costs depending on who runs it. Google runs Gemini on custom TPUs optimised for transformer inference. A self-hosted open-source model running on older GPUs will use more energy for the same workload. Cloud providers with newer hardware and better utilisation rates deliver more efficient inference.

The right-sizing principle

The most impactful efficiency strategy is simple: use the smallest model that does the job well. This is right-sizing.

For everyday tasks — drafting emails, summarising articles, answering factual questions, translating text — a compact model like GPT-4o Mini, Gemini 2.0 Flash, or Claude 3.5 Haiku delivers results that are indistinguishable from frontier models at a fraction of the energy cost.

Reserve the largest, most capable models for tasks that genuinely require them: complex multi-step reasoning, nuanced creative writing, difficult code generation, or analysis of very long documents. These tasks represent a small minority of actual AI usage.

The concept is analogous to choosing the right vehicle: you do not need a truck to buy groceries. Using GPT-4 Turbo to rewrite a tweet is the computational equivalent of driving an 18-wheeler to the corner shop.

The most efficient models right now

Based on our data, here are the five most energy-efficient text and code models currently available:

Gemini NanoGoogle

0.01 Wh

Gemini 3.1 Flash LiteGoogle

0.06 Wh

Gemini 2.0 FlashGoogle

0.08 Wh

Gemma 3 4BGoogle

0.08 Wh

Phi-4 MiniMicrosoft

0.08 Wh

Ranked by energy per query (lower is better). These models offer the best efficiency for text and code tasks. See the full comparison.

These models share common traits: smaller parameter counts, efficient architectures, modern hardware, and deployment optimisations from their providers. They demonstrate that cutting-edge capability and environmental responsibility are not mutually exclusive.

Putting it into practice

Here is a practical framework for choosing efficiently:

Start with the smallest model in the right category. If your task is text-based, begin with a compact text model. Only move up if the results are genuinely inadequate.
Evaluate whether you need AI at all. For simple factual lookups, a web search uses less energy than an AI query. Not every task needs a large language model.
Write focused prompts. Shorter, clearer prompts produce faster inference and lower energy consumption. A 100-token prompt uses roughly 100x less energy than a 100,000-token prompt.
Use our tools. The calculator lets you estimate the impact of your specific usage pattern. The model directory lets you evaluate models side by side. Individual model pages provide detailed efficiency data.

Every query has a cost. Knowing that cost — and acting on it — is what Know Your Compute is for.

Continue reading

How AI Uses Electricity

A detailed guide to how AI models consume electricity, from GPU computation to data centre overhead. Understand why different models use different amounts of energy.

AI's Water Footprint Explained

How AI data centres consume water for cooling, the difference between direct and indirect water use, and what's being done to reduce the impact.

Training vs Inference: Where AI Energy Goes

Understanding the difference between AI training and inference energy costs, why inference now dominates total consumption, and what the Jevons paradox means for AI energy use.

Back to Learn overview