Researchers at Microsoft Research Asia have discovered a solution to the excruciatingly slow object detection that is characteristic to existing “deep convolutional neural networks” (CNNs) which has been published in Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, a research paper written by Kaiming He and Jian Sun, along with a couple of academics serving internships at the Asia lab: Xiangyu Zhang of Xi’an Jiaotong University and Shaoqing Ren of the University of Science and Technology of China.

“Image recognition involves two core tasks: image classification and object detection,” Mr. He explains. “In image classification, the computer is taught to recognize object categories, such as “person,” “cat,” “dog,” or “bike,” while in object detection, the computer needs to provide the precise positions of the objects in the image.” The second task, Sun adds, is the more difficult of the two. “We need,” he says, “to answer ‘what and where’ for one or more objects in an image.”

Image recognition has gained rapidly from the use of deep neural networks and deep learning, along with the availability of prodigious data sets. Here, particularly, such networks are called CNNs, inspired by biological processes of the human brain, explains Microsoft Research.

However the algorithms are too slow for object detection in practice having to be applied thousands of times on a single image, for detecting a few objects. The new solution speeds up the process by almost 100 times, with impeccable accuracy.

They outline a new network structure that uses “spatial pyramid pooling” (SPP) technique to generate a descriptor from a region of any size.

The researchers, although quite proud of their breakthrough, believe that the field needs further exploration. “One of the important next steps,” Mr. Sun notes, “is to obtain much larger and richer training data. That will significantly impact the research in this direction.”

“Our work is the fastest deep-learning system for accurate object detection,” Mr. He said. “The speed is getting very close to the requirement for consumer usage.”

Read more here.

(Image source: Microsoft)

Previous post

Neglected Machine Learning Ideas

Next post

Logi Analytics Transforms Data Discovery with New Version of Logi Vision