ARM Unveils ‘Lumex 2’ NPU Promising 5x On-Device LLM Speed for 2027 Phones
Industry 3 days ago · 5 min read

ARM Unveils ‘Lumex 2’ NPU Promising 5x On-Device LLM Speed for 2027 Phones

ARM has thrown down a gauntlet to the cloud. At a press briefing in Cambridge on Tuesday, the British chip designer unveiled Lumex 2, a next-generation neural processing unit (NPU) it claims will deliver up to five times the on-device large language model performance of its current silicon. The headline ambition is striking: ARM says smartphones built on the architecture, expected to ship in flagship handsets from 2027, will run 10-billion-parameter models entirely offline, with no round-trip to a data centre required.

It is a pointed message aimed squarely at Google and Apple, whose flagship assistants still lean heavily on cloud infrastructure for their most demanding generative tasks. If ARM’s numbers hold up in shipping silicon, the calculus of where AI actually runs could shift decisively towards the device in your pocket.

What Lumex 2 actually promises

The Lumex 2 NPU is not a standalone chip but an architectural block that ARM licenses to partners such as Qualcomm, MediaTek and Samsung, who integrate it into their system-on-chip designs. According to ARM, the new design roughly triples raw throughput for matrix operations while a redesigned memory subsystem cuts the bandwidth bottleneck that has historically crippled on-device inference.

The company is leaning hard on low-precision arithmetic, with native support for 4-bit and emerging 2-bit quantisation formats that shrink model footprints without the catastrophic accuracy loss seen in earlier compression schemes. ARM also claims a 40% improvement in tokens-per-watt, the metric that ultimately determines whether a phone can sustain a long AI conversation without becoming a hand-warmer.

“The interesting jump here isn’t peak performance, it’s sustained efficiency,” said Dr Priya Anand, a mobile silicon analyst at Wexford Research. “Anyone can post a big benchmark number for two seconds. Running a 10B model for a five-minute interaction without thermal throttling — that’s the genuinely hard problem, and that’s where ARM is staking its claim.”

A direct challenge to cloud-dependent assistants

The strategic framing is unmistakable. Google’s Gemini and Apple’s Intelligence stack both run smaller distilled models locally but escalate complex queries to the cloud, a model that carries privacy, latency and cost implications. ARM is betting that the centre of gravity is moving the other way — towards models that live and run on the handset.

That bet is well-timed. European firms including Mistral have aggressively pushed compact, openly available models optimised for offline use, and a growing cohort of developers want inference that never leaves the device for regulatory and privacy reasons.

“On-device isn’t just a performance story, it’s a trust story,” said Marcus Feld, founder of edge-AI consultancy Northbridge Labs. “Enterprises in healthcare and finance have been waiting for hardware that lets them run capable models without sending sensitive data anywhere. Lumex 2, if it delivers, is the kind of platform that unlocks that.”

There are caveats. A 10B-parameter model, even heavily quantised, sits well below the frontier models that power the most impressive cloud experiences. The question is not whether a phone can match GPT-class systems — it cannot — but whether a locally run model is now good enough for the bulk of everyday tasks: summarising, drafting, translating and reasoning over personal data.

The UK’s edge-AI play

For ARM, freshly buoyant after its blockbuster return to public markets, Lumex 2 is also a statement about British technological relevance. The company designs the architectures underpinning the overwhelming majority of the world’s smartphones, and it is keen to position Cambridge as the hardware backbone of the coming edge-AI shift — even as the UK lacks a domestic equivalent to the hyperscale cloud giants.

  • Licensing reach: ARM’s designs already ship in billions of devices annually, giving any architectural win enormous distribution.
  • Ecosystem timing: The push aligns with a wave of open, compact models suited to offline deployment.
  • Sovereignty narrative: On-device AI dovetails with European appetite for reducing dependence on US cloud providers.

Sceptics note that ARM has every incentive to present rosy figures ahead of silicon that is still two years out, and that real-world performance will depend heavily on how partners implement the design. Promised efficiency gains have a habit of eroding by the time chips reach consumers.

The race ahead

ARM will not have the field to itself. Qualcomm and Apple are both investing aggressively in their own NPU designs, and the gap between announcement and shipping product gives rivals room to respond. Whether 2027 handsets actually deliver a fivefold leap will hinge on factors well beyond ARM’s control, including memory costs and the maturity of quantised models.

What this means: If Lumex 2 lives up to even part of its billing, the centre of AI computation could quietly migrate from the data centre to the handset, reshaping the economics of generative AI and handing privacy-conscious users genuine offline capability. For the UK, it is a rare chance to sit at the heart of a hardware shift rather than watch from the sidelines — provided the silicon, when it finally arrives in 2027, can match the ambition of Tuesday’s slides.

Photo by EqualStock IN on Pexels

Related Stories
Get in Touch

Have a question, tip, or story idea? We read every message.