Hardware

The thermal wall: how phones and handhelds spend the same watt-budget two different ways

Mobile silicon stopped being transistor-limited and became heat-limited. This is a deep dive on what actually moves the heat (graphite, vapor chambers, fans) and the software that decides who gets throttled first, read across two device classes that answer the same thermal equation in opposite directions.

Arthur Dutra·May 22, 2026·20 min readShare ↗RSS

A modern phone SoC and a handheld APU are, by specs, fast. The Snapdragon 8 Elite Gen 5 will post a 3DMark Wild Life Extreme loop of nearly 6,900 on the first pass. The problem lies on the second pass and onward: by loop 20, that same chip in a conventionally cooled phone lands at roughly 1,715, a stability of only about 25 percent (Android Headlines). The silicon did not get slower; the chassis ran out of places to put the heat.

For instance, here are 10 passes I did in my own Samsung Z Flip 6 with a Snapdragon 8 Gen 3 and a vapor chamber cooling system. Z Flip 6 thermal test

This is the part of mobile performance nobody puts on the spec sheet. Peak clocks are a marketing number, while sustained clocks are an engineering result almost entirely driven by thermodynamics. I want to walk through what actually moves heat out of a handheld device, why phones and gaming handhelds arrive at completely different answers, and where the software layer either rescues the hardware or quietly sabotages it.

The constraint is not the junction, it is the user's hand

The intuitive failure mode is the chip cooking itself. Junction thermals does matter, and silicon will protect itself long before it melts. But in a device you hold, the binding constraint usually arrives earlier from a different direction: skin temperature. The outside surface of the device should not get hot enough to be uncomfortable or unsafe to hold, and that ceiling sits far below anything the silicon would consider dangerous.

That single fact reframes the entire discipline. In a PC, for instance, thermal design is about getting heat off the die and into the room as fast as possible. But, in a phone, you are not allowed to dump that heat into the room because the only interface to the room is the surface pressed against a palm. So the action verb changes from "remove heat" to "spread heat", buying time and surface area while keeping every spot on the chassis below the comfort threshold. A flagship phone running a 3D workload fails a sustained average score because the back glass would burn you if it kept going.

Heat flux is the reason this is hard. A high-performance SoC concentrates several watts into a die smaller than a fingernail. If that energy is not diffused immediately, the local temperature spikes, trips the control threshold, and the system throttles (Sheen Technology). Everything downstream is a fight against that concentration.

Moving heat without a fan: the passive stack

A flagship phone has no fan, or active cooling, for that matter. What it has instead is a layered composite, and each layer does one job.

The first job is spreading, and the workhorse is synthetic graphite. Pyrolytic graphite sheet has an in-plane thermal conductivity in the range of 1,500 to 1,900 W/(m·K), which is four to five times that of pure copper (Sheen Technology). The catch is the phrase "in-plane." Graphite is wildly conductive across the sheet and almost useless through it, exactly what you want for a phone: take the hotspot under the SoC and smear it sideways across the whole back of the device, turning a pinpoint of high flux into a broad, lukewarm plane that the chassis can shed gradually. That is very similar to what a conventional heatsink does in a conventional desktop air cooler: spreading the heat into more surface area.

Graphite alone hits a wall under sustained load. It conducts heat, it does not transport heat, so once the whole sheet is saturated there is nowhere left for the energy to go and throttling sets in (KingKa Tech). That is where the vapor chamber comes in.

A vapor chamber is a sealed, flat cavity with a wick structure and a working fluid at low pressure (usually water). Over the hotspot the fluid boils, absorbing a large amount of energy as latent heat of vaporization. Then, The vapor rushes to the cooler regions of the chamber, condenses, releases that energy, and the wick pulls the liquid back to the hot zone by capillary action. It is a two-dimensional heat pipe, and because it relies on phase change rather than conduction, it actively carries heat across the device instead of slowly bleeding it through a solid (Digital Trends). The performance gap is measurable: a Galaxy S21 Ultra with a vapor chamber showed roughly 30 percent fewer performance drops under gaming load than devices relying on heat pipes (KingKa Tech).

In practice the high-end thermal stack is not a choice between these, it is all of them at once: thermal interface material at the die, a copper foil or plate, an ultra-thin vapor chamber, and graphite layers fanning the spread out toward the shell (Sheen Technology). The clearest signal of how far this has gone mainstream is that Apple, long a holdout, moved the iPhone 17 Pro to a vapor chamber in 2025 (Gadget Hacks). When the most conservative thermal team in the industry adopts phase-change cooling (though describing the vapor chamber in the new smartphone as innovative design in the marketing campaign reads out as questionable), the passive era of "just add more graphite" is over.

But notice what the passive stack never does: it never gets the heat off the device. It spreads, buffers, and delays. Against a sustained workload, delay runs out. Vapor chambers and graphite raise the ceiling; they do not remove it.

The active divide: when a fan rewrites the equation

A gaming handheld makes a different bet, accepting thickness, weight, and noise so that in exchange it gets the one thing a phone cannot have, which is forced convection. A fan does not merely spread heat the way graphite does; it evicts it, and that single component is enough to change which term in the equation is binding.

The hardware reflects the budget. The Steam Deck uses a copper heat spreader and a heat pipe feeding a single blower, sized for an APU that runs between roughly 3 and 15 watts, with 15 watts being the practical sweet spot for performance per watt (Steam Deck HQ). The ROG Ally spends more: twin heat pipes off a copper spreader feeding two fans (Tom's Hardware), built around a Z1 Extreme with a configurable TDP from 9 to 30 watts, a short-term boost ceiling near 40 watts, and a 25 watt mode on battery against 30 watts plugged in (HotHardware).

Those wattage numbers are really the whole story, because where a passively cooled flagship phone is fighting to sustain something on the order of a handful of watts at the skin without becoming uncomfortable, a handheld with active cooling sustains two to four times that continuously, simply because the fan is exhausting the heat to the room rather than parking it against your hand. So the thickness you feel in a Steam Deck relative to a phone is not bad industrial design at all; it is the volume that the airflow path and heat pipes demand, and it is the very reason the device can hold a clock the phone has to abandon.

The comparison also exposes how non-linear the low end is. At a matched 15 watts the Steam Deck and ROG Ally perform about the same (Steam Community), which tells you that below a certain power the architecture barely matters and watts are watts. The Ally only pulls clearly ahead when it is allowed to open up past 15 watts, and it can only open up because it has the cooling headroom to spend those watts without melting the skin or stalling on junction temperature, which is precisely why a handheld's cooling capacity behaves less like a comfort feature and more like the performance ceiling itself, denominated directly in watts.

The full spectrum: fanless budget, the Switch, and the Asian high end

The phone-versus-handheld split is the clean version of the story, while the full market is a spectrum, and the budget Asian handheld segment, which moves enormous volume, sits at an end the flagship discussion usually ignores. It is worth walking the whole range, because each tier is a different answer to the same equation.

The budget tier designs the problem away. An Anbernic RG35XX H runs an Allwinner H700, four Cortex-A53 cores at 1.5 GHz with a small Mali G31 GPU, and it is fanless (Anbernic). There is no thermal stack worth describing because there is almost no heat to move. The SoC draws so little power that passive conduction through the board and shell handles it outright, which is also how the device gets to roughly eight hours of battery. This is the most underrated strategy in the category, since the trick is not to cool the heat but to refuse to generate it in the first place: match deliberately low-power ARM silicon to an emulation workload it can clear comfortably, and the thermal wall simply never appears. The cost is explicit and accepted, because these devices cannot touch anything modern, and so they are not really throttled so much as deliberately scoped.

That scoping is also why the segment is a commercial machine, and why it lives overwhelmingly in the Asian market. Removing the fan removes a moving part, a motor, an airflow path, and a chunk of the bill of materials, which lets a maker like Anbernic, Powkiddy, or Miyoo ship a complete device for tens of dollars and iterate on a new model every few months. Cheap, abundant SoCs like the Allwinner H700 and Rockchip RK3566 are produced at a scale that flagship handheld silicon never reaches, and a whole family of devices gets built on a single chip with minor shell and screen variations (linhpham.org). Fanless is therefore not a compromise this segment merely tolerates but its entire business model, resting on a low BOM, the absence of acoustic complaints, long battery life, and a workload ceiling the customer already understands and accepts, so that in the end the thermal decision and the market decision turn out to be one and the same.

The Switch family sits a step up and makes a more surprising statement. The original Switch, marketed as a casual console, has always carried an active fan and a heat pipe over its Tegra X1. The Switch 2 doubles down: its teardown shows a fan paired with copper heat pipes cooling the Tegra T239, with an intake vent at the bottom and exhaust at the top (NotebookCheck, TechSpot). The handheld operating load is only around 10 watts (Deltia's Gaming), yet Nintendo still spends the volume on forced convection. That is the same lesson the phone benchmarks teach from the other direction: the moment you want a non-trivial SoC to hold a clock inside a sealed enclosure, a fan stops being optional, even at 10 watts; even for Mario Kart. The Switch 2's cooling is reportedly sized for close to 40 watts of capacity (Deltia's Gaming), which is the headroom that lets it hold through something as heavy as Cyberpunk 2077 rather than collapsing the way a fanless phone would.

The high end of the Asian Android handheld market closes the loop. Devices like the AYN Odin 2, built on a Snapdragon 8 Gen 2, are the same silicon class as a flagship phone, and the telling detail is that they all carry fans. The Odin 2 holds 95 to 98 percent stability under thermal stress (Pocket Retro Gaming), nearly the inverse of the 25 to 46 percent a fanless phone manages on comparable silicon. The Odin 2 Portal peaks at about 45°C surface temperature on full load with its smart fan curve (DroiX), which is roughly the same comfort ceiling a phone has to respect. The difference is that the Odin reaches that ceiling while sustaining a load the phone cannot, precisely because the fan is doing the eviction, so that with the same chip family and the same skin limit you still arrive at the opposite sustained result, and once again the fan turns out to be the entire difference.

Stack the category top to bottom and it reads as one continuous spectrum rather than separate products. At the bottom, fanless budget ARM devices that never generate the heat. In the middle, the Switch family proving a fan is mandatory the instant performance has to hold, even at modest wattage. At the top, identical flagship silicon split into a passive phone that throttles and an actively cooled handheld that does not. The variable across the entire range is the cooling architecture and the watt budget it buys.

The whole spectrum laid out as thermal architecture:

Tier	Example	SoC	Sustained power	Cooling stack	Fan
Budget fanless	Anbernic RG35XX H	Allwinner H700 (4×A53 @1.5 GHz)	~1 to 2 W	Conduction through board and shell	No
Casual console	Nintendo Switch 2	Nvidia Tegra T239	~10 W handheld (~40 W cooling capacity)	Copper heat pipes + blower	Yes
Passive flagship phone	Galaxy S25 Ultra	Snapdragon 8 Elite Gen 5	Skin-limited, single-digit watts	Graphite + ultra-thin vapor chamber	No
Active gaming phone	REDMAGIC 11 Pro	Snapdragon 8 Elite Gen 5	Higher sustained	Vapor chamber + internal fan	Yes
Android handheld	AYN Odin 2	Snapdragon 8 Gen 2	Fan-limited	Heat spreader + fan	Yes
x86 handheld	Steam Deck	AMD Zen 2 APU	3 to 15 W	Copper spreader + heat pipe	Yes (single)
x86 handheld	ROG Ally	AMD Z1 Extreme	9 to 30 W (~40 W boost)	Twin heat pipes	Yes (dual)

And the same idea measured, where comparable data exists, as sustained performance under a stress loop. The phone-class rows share roughly the same silicon, so the spread is almost entirely cooling:

Device	Cooling	Sustained stability	Workload
Mainstream flagship phone	Passive (graphite + vapor chamber)	~25%	3DMark Wild Life Extreme, 20-loop (Android Headlines)
Galaxy S25 Ultra	Passive (vapor chamber)	~46%	3DMark Wild Life Extreme (PhoneArena)
REDMAGIC 11 Pro	Active (internal fan)	~80%	3DMark Wild Life Extreme (PhoneArena)
AYN Odin 2	Active (fan)	95 to 98%	Thermal stress test, ~45°C surface on Portal variant (Pocket Retro Gaming, DroiX)
Snapdragon 8 Elite Gen 5, no advanced cooling	Passive	<30% of peak	Sustained graphics (BigGo)

That second table is really the thesis in a single frame, because as you move down the rows the silicon barely changes while the sustained number roughly triples, and the only thing tracking that climb is whether there happens to be a fan.

The benchmark reality

Put the two device classes on the same stress test and the divide becomes brutal.

On a 20-loop 3DMark run, the Snapdragon 8 Elite Gen 5 in a mainstream flagship collapses to about 25 percent stability (Android Headlines), and on devices without serious cooling the graphics performance falls to under 30 percent of the opening peak (BigGo). The Galaxy S25 Ultra manages around 46 percent. The interesting outlier is the REDMAGIC 11 Pro, which holds about 80 percent stability (PhoneArena), and the thing the REDMAGIC has that the others do not is simply an internal fan, which makes it a phone that decided to behave like a handheld where thermals are concerned, so the curve rewards it accordingly.

This is the clean experimental result that ties the whole argument together. Same silicon, same workload, and the sustained number tracks almost entirely with the cooling solution rather than the chip. The 8 Elite Gen 5 also throttled to roughly 58 percent of its peak in a 15-minute CPU test (BigGo), a reminder that this is a sustained-power problem across the whole SoC, not a GPU quirk.

The software half: who gets throttled, and when

If hardware sets the budget, then software is what spends it, and software is also what decides who takes the cut once that budget is blown, which is the point where mobile optimization stops being about cooling parts and starts being about control loops.

The foundation on modern Android is energy-aware scheduling, which couples task placement to dynamic voltage and frequency scaling, deciding both which core runs a thread and what voltage and frequency that core sits at (Android Open Source Project). DVFS is the lever, and a thermal-aware scheduler pulls it in response to temperature sensors scattered across the package and skin. When the device heats up, frequencies and voltages come down, power drops, and so does sustained performance. That is the throttling you feel.

The naive implementation is blunt, and it is worth naming the failure modes because the good engineering is mostly about avoiding them.

The first is collateral throttling, which arises because many governors throttle the whole system rather than the specific block generating the heat, so that if a background task warms the package, the governor can end up pulling frequency on every core, including the ones doing the work you actually care about (Boston University PEAC Lab). The thermal budget is real enough; it just gets spent indiscriminately.

The second, and the more interesting one, is the downward spiral between independent CPU and GPU governors. In a game the two are coupled through frame timing, but if each governor only sees its own utilization and temperature, they can drive each other down: the CPU throttles, frames stall, the GPU sees idle time and throttles too, which slows the frame further, which feeds back. Two locally reasonable control loops produce a globally terrible result (ML-Gov, ResearchGate). The fix is cooperative CPU-GPU thermal management, where a single integrated governor allocates the shared thermal budget across both processors with knowledge of the frame pipeline rather than letting them fight over it (Cooperative CPU-GPU Thermal Management, ResearchGate).

This is also why frame pacing belongs in the thermal conversation, not just the smoothness conversation. A workload that renders frames as fast as possible burns power producing frames the display cannot show and the thermal budget cannot sustain. Capping and evenly pacing frames lowers peak power draw, which keeps the device under its thermal threshold longer, which delays the point at which the governor has to cut clocks. A well-paced 60 fps that holds is worth more than an unpaced 90 fps that throttles to 40 inside five minutes. The pacing is a thermal strategy disguised as a rendering one.

At the bottom of the stack, device makers tune the power HAL, which can lock maximum CPU and GPU frequencies or apply device-specific optimizations to manage how aggressively the system rides its thermal limit (Android Open Source Project). This is the layer where a manufacturer decides the personality of the throttle curve, choosing between a phone that holds high clocks briefly and then falls off a cliff and one that settles for a lower but flatter sustained ceiling, and neither choice is wrong, since they are simply different products built on the same parts.

The handheld inverts the software problem in a revealing way. Because it has thermal headroom from the fan, its governors are tuned around user-selected TDP envelopes rather than a hard skin-temperature panic. A Steam Deck or Ally exposes the watt budget to the user as a slider, which is the honest version of what every device is doing internally. The phone hides the same decision behind an automatic curve because it has no headroom to give the user and no fan to make the choice cheap.

Two poles of one equation

Take the two ends of that spectrum, the passive flagship phone and the actively cooled handheld, and both are solving the same constrained optimization: maximize sustained performance subject to a thermal limit, a power source, and a volume budget. They just weight the constraints differently.

The phone treats volume and skin temperature as immovable and treats performance as the variable. No fan, so no eviction; the entire strategy is spreading and delay through graphite and vapor chambers, and the governor's job is to ration a budget that can never grow. The phone's ceiling is set the moment the industrial designer fixes the thickness and the comfort threshold.

The handheld treats sustained performance as the goal and spends volume, weight, noise, and battery to buy the cooling that makes it achievable. The fan converts the thermal problem from "where do I store this heat" into "how fast can I exhaust it", and that conversion is why a 30 watt handheld and a flagship phone built on comparable process nodes live in different performance universes despite similar peak silicon.

The REDMAGIC result is the proof sitting in the middle, because the moment a phone adopts the handheld's core decision and adds an internal fan, its sustained curve jumps from the 25 to 46 percent band up toward 80 percent. The chip, in other words, was never the variable; the cooling architecture was, the entire time.

Field notes

A few things I would take into any device project after staring at this:

Sustained performance is the only performance number that means anything for a sustained workload, and it is dominated by cooling architecture, not silicon. If a spec sheet only quotes peak, assume the chip cannot hold it.
Passive cooling raises the ceiling, it never removes it. Graphite spreads, vapor chambers buffer through phase change, and both buy time against a wall they cannot eliminate. Forced convection is the only thing on the table that changes which constraint is binding.
The cheapest thermal solution is a scoped workload. The budget ARM handhelds win their segment by never generating the heat in the first place, and the Switch shows that even a low-power target earns a fan once it has to hold performance. Pick the silicon to fit the thermal envelope before reaching for a cooling solution to rescue a chip that was never going to fit.
The software governor is where good hardware goes to die or gets rescued. Independent CPU and GPU control loops will antagonize each other under load; a cooperative, frame-aware governor is worth more than another graphite layer. Frame pacing is a thermal tool first and a smoothness tool second.

And the honest design move is the Steam Deck's slider. Every device is rationing a watt budget against a thermal limit whether or not it admits it. The handheld lets you see the budget while the phone decides for you, because it has no fan to make the decision cheap.