Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

New Apple paper reveals how AI can track your daily chores

The team utilized the Ego4D dataset to analyze twelve distinct activities like cooking and cycling.

byKerem Gülen
November 23, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Apple researchers published a study detailing how large language models (LLMs) can interpret audio and motion data to identify user activities, focusing on late multimodal sensor fusion for activity recognition.

The paper, titled “Using LLMs for Late Multimodal Sensor Fusion for Activity Recognition,” by Ilker Demirel, Karan Ketankumar Thakkar, Benjamin Elizalde, Miquel Espi Marques, Shirley Ren, and Jaya Narain, was accepted at the Learning from Time Series for Health workshop at NeurIPS 2025. This research explores integrating LLM analysis with traditional sensor data to enhance activity classification.

The researchers state, “Sensor data streams provide valuable information around activities and context for downstream applications, though integrating complementary information can be challenging. We show that large language models (LLMs) can be used for late fusion for activity classification from audio and motion time series data.” They curated a subset of data for diverse activity recognition from the Ego4D dataset, encompassing household activities and sports.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Evaluated LLMs achieved 12-class zero- and one-shot classification F1-scores significantly above chance, without task-specific training. Zero-shot classification through LLM-based fusion from modality-specific models enables multimodal temporal applications with limited aligned training data for a shared embedding space. LLM-based fusion allows model deployment without requiring additional memory and computation for targeted application-specific multimodal models.

The study highlights LLMs’ ability to infer user activities from basic audio and motion signals, showing improved accuracy with a single example. Crucially, the LLM was not directly fed raw audio. Instead, it received short text descriptions generated by audio models and an IMU-based motion model, which tracks movement via accelerometer and gyroscope data.

For the study, researchers utilized Ego4D, a dataset featuring thousands of hours of first-person perspective media. They curated a dataset of daily activities from Ego4D by searching narrative descriptions. The curated dataset includes 20-second samples from twelve high-level activities:

These activities were chosen to cover household and fitness tasks and based on their prevalence in the larger Ego4D dataset. Audio and motion data were processed through smaller models to generate text captions and class predictions. These outputs were then fed into different LLMs, specifically Gemini-2.5-pro and Qwen-32B, to assess activity identification accuracy.

Apple compared model performance in two scenarios: a closed-set test where models chose from the 12 predefined activities, and an open-ended test without provided options. Various combinations of audio captions, audio labels, IMU activity prediction data, and extra context were used for each test.

The researchers noted that the results offer insights into combining multiple models for activity and health data. This approach is particularly beneficial when raw sensor data alone is insufficient to provide a clear picture of user activity. Apple also published supplemental materials, including Ego4D segment IDs, timestamps, prompts, and one-shot examples, to facilitate reproducibility for other researchers.


Featured image credit

Tags: AIAppleego4d

Related Posts

Why your lonely teenager should never trust ChatGPT with their mental health

Why your lonely teenager should never trust ChatGPT with their mental health

November 21, 2025
Google wants AI to build web pages instead of just writing text

Google wants AI to build web pages instead of just writing text

November 20, 2025
What AI really sees in teen photos: New data shows sexual content is flagged 7× more often than violence

What AI really sees in teen photos: New data shows sexual content is flagged 7× more often than violence

November 19, 2025
Harvard’s new metasurface shrinks quantum optics into a single ultrathin chip

Harvard’s new metasurface shrinks quantum optics into a single ultrathin chip

November 19, 2025
A wireless eye implant helps patients with severe macular degeneration read again

A wireless eye implant helps patients with severe macular degeneration read again

November 18, 2025
Light powered tensor computing could upend how AI hardware works

Light powered tensor computing could upend how AI hardware works

November 17, 2025

LATEST NEWS

Perplexity brings its AI browser Comet to Android

Google claims Nano Banana Pro can finally render legible text on posters

Apple wants you to chain Mac Studios together to build AI clusters

Bitcoin for America Act allows tax payments in Bitcoin

Blue Origin upgrades New Glenn and unveils massive 9×4 variant

Amazon launches Alexa+ in Canada with natural-language controls

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.