Smaller Models, Smaller Footprint: Why On-Device AI Is Better for the Planet
Written by: Baran Melik
Published: November 2025

AI's progress has long been measured in accuracy and scale. But a truer measure might be environmental: megawatt-hours, tons of CO₂, and liters of clean drinking water.
Running and maintaining the world's largest cloud models now consumes staggering resources (much of it invisible to users) and a meaningful share of that footprint comes from clean, drinkable water used in data-center cooling.
There's a better path. Domain-specific Small Language Models (SLMs), which run locally on laptops and workstations, can achieve comparable reasoning performance within their trained domains while using a fraction of the energy, emitting almost no carbon, and requiring no cooling water at all.
The Hidden Cost of Cloud AI
Each time a large cloud model responds to a query, it draws on vast data-center infrastructure.
Those servers need electricity to power the GPUs and clean water to cool them.
At enterprise scale, millions of queries per day, the numbers add up to levels comparable to small industrial operations.
Research on GPT-4 and GPT-5 class systems suggests that supporting large-scale inference consumes hundreds of megawatt-hours of electricity annually per enterprise-level client, producing over 150 tons of CO₂ equivalent and evaporating hundreds of thousands of liters of clean drinking water.
That's the same annual carbon footprint as roughly 30 passenger cars and the water use of more than 300 people.
From Cloud to In-House to On-Device
Many enterprises have turned to "private AI" by building or renting internal GPU clusters to run large language models within their own networks.
While this reduces dependency on external providers and can improve data control, it doesn't solve the environmental problem.
These in-house systems still rely on the same energy hungry GPUs and still require constant cooling with clean, drinkable water.
On-device AI goes a step further. Instead of operating large models on centralized servers, domain-specific SLMs run locally on existing laptops, desktops, or edge devices.
They achieve large-model reasoning quality within their domain: contract generation, market analysis, sales, drafting etc, while eliminating the energy, carbon, and water overhead of data centers entirely.
The Numbers at a Glance
| Deployment Type | Example Workload | Annual Electricity | Annual Carbon (CO₂e) | Annual Clean Drinking Water |
|---|---|---|---|---|
| Cloud LLM (e.g., ChatGPT, GPT-5-scale) | 10 M queries/day via APIs | ~400 MWh | ~150 t CO₂e (~32 cars) | ~250,000 L (~340 people's annual drinking water) |
| In-House LLM (corporate GPU cluster) | Fine-tuned 70–100B-param model on internal servers | ~200 MWh | ~70 t CO₂e (~15 cars) | ~100,000 L (~135 people's drinking water) |
| On-Device Domain-Specific SLM | Local inference on laptops/workstations | < 5 MWh | < 1 t CO₂e | ≈ 0 L (no cooling required) |
Even before training is considered, the day-to-day use of large cloud or in-house AI systems carries a massive environmental bill.
Running a comparable workload through small, local models reduces that bill by over 99 percent and eliminates the consumption of clean drinking water entirely.
Why This Matters
The environmental cost of AI isn't just a training problem. Inference, the billions of daily interactions users have with large models, now dominates total energy and water use.
That means every decision to move computation from remote data centers to local devices multiplies its impact on sustainability.
Performance Without the Footprint
Smaller doesn't mean weaker. Within focused domains, M&A analysis, petition drafting, compliance review, SLMs match or outperform general-purpose LLMs because they specialize.
They reason deeply about a narrower corpus, drawing relevant conclusions with less computation and far less resource use.
In effect, they bring large-model intelligence to the problems that matter, without inheriting large-model waste.
The Future Is Local and Sustainable
The first era of AI centralized computation in the cloud. The next will distribute it to edge servers, laptops, and personal devices where it's faster, safer, and vastly more sustainable.
This shift is already under way as enterprises seek solutions that align intelligence with environmental and data-sovereignty goals.
We aim to lead this transformation at Icosa, by optimizing domain-expert SLMs that organizations can deploy privately on their own devices to demonstrate that high-performance AI can coexist with both confidentiality and sustainability.
Owning intelligence no longer means owning infrastructure; it means using it responsibly, where the work and the data already live.