The Grammar Of Movement- MIT Present Video Recognition Programme

Hamed Pirsiavash, a postdoctoral scholar from MIT, is developing a new activity-recognition algorithm to identify what’s happening in video files. Pirsiavash and his former thesis advisor, Deva Ramanan of the University of California at Irvine, will present the video recognition programme at the Conference on Computer Vision and Pattern Recognition in June. In a similar fashion to the AlchemyVision Image-Processing software, Pirsiavash’s programme also draws on natural language processing techniques, in that it analyses small parts of a sequence to uncover what is happening in the larger context.

“One of the challenging problems they [NLP researchers] try to solve is, if you have a sentence, you want to basically parse the sentence, saying what is the subject, what is the verb, what is the adverb,” Pirsiavash says. “We see an analogy here, which is, if you have a complex action — like making tea or making coffee — that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb.”

For each new action, Pirsiavash and Ramanan’s algorithm must learn a new set of ‘grammar’, or subactions that comprise the whole. The algorithm is not wholly unsupervised; they feed the algorithm a set of videos depicting the same action and specify how many subactions the algorithm should identify, but not what the subactions are.

Although there are several companies working on video-processing programmes, (including Dropcam, who are particularly interested in distinguishing normal and anomalous actions), Pirsiavash and Ramanan’s has several advantages. First, the time it takes to analyse a video is on a linear scale; if a video is 10 times as long, it takes 10 times as long to process (rather than 1,000 times longer, as was the case with previous algorithms). Secondly, its comprehension of subactions means it’s able to identify partially-completed actions, and doesn’t have to wait until the end of video clip to deliver results. Third, the amount of memory required to run the algorithm is fixed; it doesn’t require any more space to process lengthier or larger clips.

Looking forward, Pirsiavash is particularly excited about possible medical applications of the programme. For instance, they might be able to teach the programme the grammar of properly- and improperly-executed physical therapy exercises, or distinguish whether a patient has remembered or forgotten to take their medicine.

Read more here.
(Image source: MIT website)

Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!

[mc4wp_form]

Tags: Machine Learning MIT Natural Language Processing surveillance Visual computing

The Grammar of Movement- MIT Present Video Recognition Programme

Related Posts

OpenAI launches ChatGPT Work productivity app

Meta files patent for AI-powered emotional monitoring device

Xiaomi 18 Pro to debut in September with Snapdragon 8 Elite Gen 6 chip

Google rolls out ‘How this ad was made’ for AI ads

OpenAI launches GPT-5.6 with Sol, Terra, and Luna models

SpaceXAI launches Grok 4.5 as new flagship AI model

LATEST NEWS

OpenAI launches ChatGPT Work productivity app

Meta files patent for AI-powered emotional monitoring device

Xiaomi 18 Pro to debut in September with Snapdragon 8 Elite Gen 6 chip

Google rolls out ‘How this ad was made’ for AI ads

OpenAI launches GPT-5.6 with Sol, Terra, and Luna models

SpaceXAI launches Grok 4.5 as new flagship AI model

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Capmonster

Superhuman

Pixelvibe

Punchlines

Leadfwd

AI RoastBot

Bit.ai

Pikzels

Aflow

Chai AI

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

The Grammar of Movement- MIT Present Video Recognition Programme

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us