Learn

How AI Uses Energy, Water & Carbon

Every AI query has a physical cost. Here's what powers the models, what it takes from the environment, and how we know.

The Three Resources Behind Every Query

When you send a prompt to an AI model, your request travels to a data centre filled with specialised chips — GPUs or TPUs — that perform billions of calculations to generate a response. That computation requires three physical resources:

Electricity (Energy)

GPUs draw hundreds of watts each. A single AI query can use anywhere from 0.001 Wh (a tiny local model) to 4+ Wh (a video generation), depending on model size and task. For context, a Google search uses about 0.3 Wh.

The electricity doesn't just power the chips — it also runs memory, networking, storage, and the cooling systems that keep everything from overheating. Data centres measure this overhead with Power Usage Effectiveness (PUE): a PUE of 1.2 means 20% of total energy goes to cooling and infrastructure.

Carbon Emissions (CO₂)

The carbon footprint of a query depends on where the electricity comes from. A data centre powered by hydroelectric or nuclear energy produces far less CO₂ per kilowatt-hour than one running on coal or natural gas.

This is measured using a Carbon Intensity Factor (CIF) — the grams of CO₂ emitted per kilowatt-hour of electricity. Sweden's grid (~25 gCO₂/kWh) is 18x cleaner than China's (~550 gCO₂/kWh), so the same model query can have vastly different emissions depending on the data centre location.

Water

Data centres use water in two ways. Direct (on-site) cooling uses evaporative systems that consume water to remove heat from servers. Indirect (off-site) water is consumed at power plants that generate the electricity — coal, gas, and nuclear plants all require water for cooling.

These are measured as site WUE and source WUE (Water Usage Effectiveness), both in litres per kilowatt-hour. A single ChatGPT query consumes roughly 2–5 mL of water — a small sip, but multiplied across billions of queries per day, it adds up to millions of litres.

How We Know: Peer-Reviewed vs Estimated Data

Not all environmental data is created equal. The AI industry is still young when it comes to transparency, so we use a tiered confidence system to be honest about what we know and how well we know it.

Verified

Verified Data

Numbers published directly by the AI provider — for example, Google's disclosure that a Gemini query uses 0.24 Wh, or Sam Altman's statement on ChatGPT's energy consumption.

How it's produced: The provider measures actual power draw from their production infrastructure and reports it publicly. These figures account for their specific hardware, optimisations, and data centre efficiency.
Limitation: Providers may report best-case scenarios or exclude certain overhead costs. We note any known caveats.
Peer-reviewed

Peer-Reviewed Data

Figures from published academic research that has been reviewed by other scientists before publication. These studies typically measure energy use through controlled experiments.

How it's produced: Researchers run models on instrumented hardware (with power meters attached to GPUs), measure energy consumption across thousands of queries, and publish the methodology and results. Other scientists review the approach for errors before publication.
Key studies we use:
Limitation: Lab conditions may differ from production deployments. Providers often use custom hardware and proprietary optimisations that researchers can't replicate exactly.
Estimated

Estimated Data

When neither the provider nor researchers have published figures, we build estimates from known hardware specifications and publicly available benchmarks.

How it's calculated: We combine four known quantities into a formula:
Energy = (GPU power × Number of GPUs × Inference time × PUE) / 3600
  • GPU power — the rated wattage of the chip (e.g., NVIDIA H100 = 700W TDP, ~1,200W with server overhead)
  • Number of GPUs — how many chips are needed, based on model size and GPU memory capacity
  • Inference time — how long the query takes, from API latency benchmarks
  • PUE — the data centre's power overhead factor (typically 1.1–1.3)

CO₂ and water are then calculated by multiplying energy by regional carbon intensity and water usage factors. See our full methodology for detailed formulas.

Limitation: These are approximations. Actual values can vary significantly based on batching, quantisation, and provider optimisations we can't observe.

How Model Size Affects Resource Usage

"Parameters" are the learned values inside a neural network — essentially, the model's knowledge. More parameters generally means the model can learn more complex patterns, but it also means more computation per query.

Bigger models use more resources

Every parameter must be loaded into GPU memory and used in the calculation for each query. This has direct consequences:

More GPUs needed

A 7B-parameter model fits on a single GPU. A 405B-parameter model may need 8 or more GPUs working in parallel, each drawing 700+ watts.

Longer inference time

More parameters means more computation per token generated. A response that takes 0.5 seconds on a small model might take 3+ seconds on a large one.

More energy per query

Combining more GPUs and longer inference time, energy scales roughly with parameter count. A 405B model can use 50–100x more energy per query than a 1B model.

More cooling required

More active chips generate more heat, increasing both electricity for cooling and water consumption at the data centre.

Bigger models perform better — up to a point

Research consistently shows that larger models are more capable. They produce more coherent text, handle more complex reasoning, and make fewer factual errors. This relationship is known as scaling laws — model quality improves predictably as parameters increase.

However, the gains are logarithmic, not linear. Doubling a model's size from 70B to 140B parameters does not double its quality. You might see a 10–15% improvement on benchmarks while doubling the energy cost. This diminishing return is why the industry is increasingly focused on efficiency:

DistillationTraining smaller models to mimic large ones (e.g., GPT-4o Mini retains most of GPT-4o's capability at a fraction of the cost)
QuantisationReducing the precision of parameters (from 16-bit to 4-bit) to cut memory and computation with minimal quality loss
Mixture of ExpertsOnly activating a subset of parameters per query, so a 235B-parameter model might only use 22B per token (e.g., DeepSeek-V3)
Custom hardwarePurpose-built chips like Google's TPUs can be 2–3x more energy-efficient than general-purpose GPUs for inference

Choosing the right size matters

The most important takeaway: you don't always need the biggest model. For everyday tasks — writing emails, summarising articles, answering questions — a smaller, efficient model like GPT-4o Mini or Gemini 2 Flash delivers nearly the same quality at a fraction of the environmental cost. Reserve the largest models for tasks that genuinely need them, like complex code generation or multi-step reasoning.

Deep Dive Guides

Explore specific topics in detail. Each guide is a self-contained reference document covering one aspect of AI's environmental impact.