Hamed Pirsiavash, a postdoctoral scholar from MIT, is developing a new activity-recognition algorithm to identify what’s happening in video files. Pirsiavash and his former thesis advisor, Deva Ramanan of the University of California at Irvine, will present the video recognition programme at the Conference on Computer Vision and Pattern Recognition in June. In a similar fashion to the AlchemyVision Image-Processing software, Pirsiavash’s programme also draws on natural language processing techniques, in that it analyses small parts of a sequence to uncover what is happening in the larger context.
“One of the challenging problems they [NLP researchers] try to solve is, if you have a sentence, you want to basically parse the sentence, saying what is the subject, what is the verb, what is the adverb,” Pirsiavash says. “We see an analogy here, which is, if you have a complex action — like making tea or making coffee — that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb.”
For each new action, Pirsiavash and Ramanan’s algorithm must learn a new set of ‘grammar’, or subactions that comprise the whole. The algorithm is not wholly unsupervised; they feed the algorithm a set of videos depicting the same action and specify how many subactions the algorithm should identify, but not what the subactions are.
Although there are several companies working on video-processing programmes, (including Dropcam, who are particularly interested in distinguishing normal and anomalous actions), Pirsiavash and Ramanan’s has several advantages. First, the time it takes to analyse a video is on a linear scale; if a video is 10 times as long, it takes 10 times as long to process (rather than 1,000 times longer, as was the case with previous algorithms). Secondly, its comprehension of subactions means it’s able to identify partially-completed actions, and doesn’t have to wait until the end of video clip to deliver results. Third, the amount of memory required to run the algorithm is fixed; it doesn’t require any more space to process lengthier or larger clips.
Looking forward, Pirsiavash is particularly excited about possible medical applications of the programme. For instance, they might be able to teach the programme the grammar of properly- and improperly-executed physical therapy exercises, or distinguish whether a patient has remembered or forgotten to take their medicine.
Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!