Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Apple researchers develop token-efficient AI for long-form video understanding

The core of this innovation is the SlowFast mechanism, which processes video through two parallel pathways simultaneously.

byEmre Çıtak
August 22, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Researchers from Apple have introduced a new family of video large language models that are both highly efficient and powerful, particularly at smaller, mobile-friendly scales. The new research, detailed in a paper titled SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding, presents a new architecture that achieves state-of-the-art performance by intelligently balancing how it processes video frames.

The ‘SlowFast’ approach to video analysis

A primary challenge for existing video AI models is managing the immense computational cost of processing long video sequences. These models face a difficult trade-off: either they process a large number of frames, which significantly increases the number of tokens and computational resources required, or they reduce the number of tokens per frame, which inevitably loses fine-grained detail. To solve this, the Apple researchers developed a model family called SlowFast-LLaVA-1.5, which uses a two-stream mechanism to analyze video content in a more balanced and efficient way.

The core of this innovation is the SlowFast mechanism, which processes video through two parallel pathways simultaneously. The Slow pathway is designed to capture detailed spatial features—the “what” of the video. It operates at a low frame rate, analyzing fewer frames but in high resolution to understand the objects and semantics within the scene. In contrast, the Fast pathway is designed to capture motion cues—the “how” of the video. It operates at a high frame rate, processing many frames but with fewer tokens per frame, allowing it to focus on movement and long-range temporal context without a massive computational load. These two streams are then combined to give the language model a comprehensive yet token-efficient understanding of the video.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

A key focus of the research was developing smaller, more efficient models that could potentially be deployed on edge devices. The paper highlights the performance of its 1B and 3B parameter models, demonstrating that even these relatively small models can achieve state-of-the-art results. For instance, the SF-LLaVA-1.5-1B model surpassed a larger competitor, Qwen2-VL-2B, across multiple benchmarks. Similarly, the 3B model outperformed its competitor, Apollo-3B, on both general video and temporal reasoning tasks.

The larger 7B model also set new state-of-the-art scores on long-form video understanding benchmarks, achieving 62.5% on LongVideoBench and 71.5% on MLVU. The efficiency of the SlowFast mechanism was a primary driver of this performance. In a direct comparison, the Apple model processed twice as many frames (128) as a competing model while using only about 65% of the input tokens (9K vs. 14K), yet it achieved better results across nearly all benchmarks.

In addition to performance, the researchers emphasized reproducibility. Unlike many state-of-the-art models that rely on large, internal datasets, SlowFast-LLaVA-1.5 was trained using a streamlined two-stage pipeline and exclusively on publicly available datasets. The first stage of training uses only images to give the model a strong foundation in general knowledge and reasoning. The second stage performs joint video-image training to learn temporal features while maintaining strong performance on still images. An ablation study confirmed the effectiveness of the combined SlowFast approach, showing that it outperforms models using only a Slow or Fast pathway individually. The study also demonstrated that the joint video-image training was a key factor in improving the model’s capabilities on both modalities.

Tags: AIApple

Related Posts

Faith in large employers is fading among UK workers

Faith in large employers is fading among UK workers

June 5, 2026
Army-funded scientists explore a new frontier in quantum physics

Army-funded scientists explore a new frontier in quantum physics

June 5, 2026
New MIT process could make lithium production cheaper and cleaner

New MIT process could make lithium production cheaper and cleaner

June 4, 2026
Researchers create AI worm that adapts attacks without human input

Researchers create AI worm that adapts attacks without human input

June 4, 2026
Researchers unlock 20-fold enhancement in ultrafast laser experiments

Researchers unlock 20-fold enhancement in ultrafast laser experiments

June 3, 2026
NASA tests next-gen radiation-hardened space computer chip

NASA tests next-gen radiation-hardened space computer chip

May 29, 2026

LATEST NEWS

Critical UpdraftPlus flaw puts 3 million WordPress sites at risk

Instagram adds new feature letting users personalize their feed algorithm

YouTube brings back direct messages after six-year hiatus

iOS 27 adds Mac-like recovery mode for iPhone and iPad

Ubisoft to close Winnipeg and Belgrade studios, cutting 380 jobs

Windows 11 June update boosts speed, adds AI tools and critical fixes

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.