Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

The Future is in Your Pocket: How to Move AI to Smartphones

Expert Explains How LLMs Can Be Mobile-Sized 

byStewart Rogers
November 18, 2024
in Conversations, Artificial Intelligence
Home Conversations

For years, the promise of truly intelligent, conversational AI has felt out of reach. We’ve marveled at the abilities of ChatGPT, Gemini, and other large language models (LLMs) – composing poems, writing code, translating languages – but these feats have always relied on the vast processing power of cloud GPUs. Now, a quiet revolution is brewing, aiming to bring these incredible capabilities directly to the device in your pocket: an LLM on your smartphone.

This shift isn’t just about convenience; it’s about privacy, efficiency, and unlocking a new world of personalized AI experiences. 

However, shrinking these massive LLMs to fit onto a device with limited memory and battery life presents a unique set of challenges. To understand this complex landscape, I spoke with Aleksei Naumov, Lead AI Research Engineer at Terra Quantum, a leading figure in the field of LLM compression. 

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Indeed, Naumov recently published a paper on this subject which is being heralded as an extraordinary and significant innovation in neural network compression – ‘TQCompressor: Improving Tensor Decomposition Methods in Neural Networks via Permutations’ – at the IEEE International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2024), a conference where researchers, scientists, and industry professionals come together to present and discuss the latest advancements in multimedia technology.

“The main challenge is, of course, the limited main memory (DRAM) available on smartphones,” Naumov said. “Most models cannot fit into the memory of a smartphone, making it impossible to run them.”

He points to Meta’s Llama 3.2-8B model as a prime example. 

“It requires approximately 15 GB of memory,” Naumov said. “However, the iPhone 16 only has 8 GB of DRAM, and the Google Pixel 9 Pro offers 16 GB. Furthermore, to operate these models efficiently, one actually needs even more memory – around 24 GB, which is offered by devices like the NVIDIA RTX 4090 GPU, starting at $1800.”

This memory constraint isn’t just about storage; it directly impacts a phone’s battery life.

“The more memory a model requires, the faster it drains the battery,” Naumov said. “An 8-billion parameter LLM consumes about 0.8 joules per token. A fully charged iPhone, with approximately 50 kJ of energy, could only sustain this model for about two hours at a rate of 10 tokens per second, with every 64 tokens consuming around 0.2% of the battery.”

So, how do we overcome these hurdles? Naumov highlights the importance of model compression techniques.

“To address this, we need to reduce model sizes,” Naumov said. “There are two primary approaches: reducing the number of parameters or decreasing the memory each parameter requires.”

He outlines strategies like distillation, pruning, and matrix decomposition to reduce the number of parameters and quantization to decrease each parameter’s memory footprint.

“By storing model parameters in INT8 instead of FP16, we can reduce memory consumption by about 50%,” Naumov said.

While Google’s Pixel devices, with their TensorFlow-optimized TPUs, seem like an ideal platform for running LLMs, Naumov cautions that they don’t solve the fundamental problem of memory limitations.

“While the Tensor Processing Units (TPUs) used in Google Pixel devices do offer improved performance when running AI models, which can lead to faster processing speeds or lower battery consumption, they do not resolve the fundamental issue of the sheer memory requirements of modern LLMs, which typically exceed smartphone memory capacities,” Naumov said.

The drive to bring LLMs to smartphones goes beyond mere technical ambition. It’s about reimagining our relationship with AI and addressing the limitations of cloud-based solutions.

“Leading models like ChatGPT-4 have over a trillion parameters,” Naumov said. “If we imagine a future where people depend heavily on LLMs for tasks like conversational interfaces or recommendation systems, it could mean about 5% of users’ daily time is spent interacting with these models. In this scenario, running GPT-4 would require deploying roughly 100 million H100 GPUs. The computational scale alone, not accounting for communication and data transmission overheads, would be equivalent to operating around 160 companies the size of Meta. This level of energy consumption and associated carbon emissions would pose significant environmental challenges.”

The vision is clear: a future where AI is seamlessly integrated into our everyday lives, providing personalized assistance without compromising privacy or draining our phone batteries.

“I foresee that many LLM applications currently relying on cloud computing will transition to local processing on users’ devices,” Naumov said. “This shift will be driven by further model downsizing and improvements in smartphone computational resources and efficiency.”

He paints a picture of a future where the capabilities of LLMs could become as commonplace and intuitive as auto-correct is today. This transition could unlock many exciting possibilities. Thanks to local LLMs, imagine enhanced privacy where your sensitive data never leaves your device.

Picture ubiquitous AI with LLM capabilities integrated into virtually every app, from messaging and email to productivity tools. Think of the convenience of offline functionality, allowing you to access AI assistance even without an internet connection. Envision personalized experiences where LLMs learn your preferences and habits to provide truly tailored support.

For developers eager to explore this frontier, Naumov offers some practical advice.

“First, I recommend selecting a model that best fits the intended application,” Naumov said. “Hugging Face is an excellent resource for this. Look for recent models with 1-3 billion parameters, as these are the only ones currently feasible for smartphones. Additionally, try to find quantized versions of these models on Hugging Face. The AI community typically publishes quantized versions of popular models there.”

He also suggests exploring tools like llama.cpp and bitsandbytes for model quantization and inference.

The journey to bring LLMs to smartphones is still in its early stages, but the potential is undeniable. As researchers like Aleksei Naumov continue to push the boundaries of what’s possible, we’re on the cusp of a new era in mobile AI, one where our smartphones become truly intelligent companions, capable of understanding and responding to our needs in ways we’ve only begun to imagine.

Tags: AIllmMobileSmartphone

Related Posts

Zoom announces AI Companion 3.0 at Zoomtopia

Zoom announces AI Companion 3.0 at Zoomtopia

September 19, 2025
Google Cloud adds Lovable and Windsurf as AI coding customers

Google Cloud adds Lovable and Windsurf as AI coding customers

September 19, 2025
Elon Musk’s xAI chatbot Grok exposed hundreds of thousands of private user conversations

Elon Musk’s xAI chatbot Grok exposed hundreds of thousands of private user conversations

September 19, 2025
The data leader’s new mandate with Oleksandr Khirnyi

The data leader’s new mandate with Oleksandr Khirnyi

September 19, 2025
DeepSeek releases R1 model trained for 4,000 on 512 H800 GPUs

DeepSeek releases R1 model trained for $294,000 on 512 H800 GPUs

September 19, 2025
Google’s Gemini AI achieves gold medal in prestigious ICPC coding competition, outperforming most human teams

Google’s Gemini AI achieves gold medal in prestigious ICPC coding competition, outperforming most human teams

September 18, 2025

LATEST NEWS

Zoom announces AI Companion 3.0 at Zoomtopia

Google Cloud adds Lovable and Windsurf as AI coding customers

Radware tricks ChatGPT’s Deep Research into Gmail data leak

Elon Musk’s xAI chatbot Grok exposed hundreds of thousands of private user conversations

Roblox game Steal a Brainrot removes AI-generated character, sparking fan backlash and a debate over copyright

DeepSeek releases R1 model trained for $294,000 on 512 H800 GPUs

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.