Researchers at Microsoft in Beijing may have developed the first computer vision system that has surpassed human capabilities in classifying objects defined in the ImageNet 2012 classification dataset, according to recently-published paper.
“To our knowledge, our result is the first to surpass human-level performance…on this visual recognition challenge,” the paper says.
The deep learning based system – Parametric Rectified Linear Unit network – achieved a 4.94% test error rate as compared to Baidu’s systems’ 5.98% and the ILSVRC 2014 winner, GoogLeNet’s 6.66%. Human’s have an error rate of 5.1%.
The paper titled Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, the researchers point out that the experiments were carried out on the 1000-class ImageNet 2012 dataset which contains about 1.2 million training images, 50,000 validation images, and 100,000 test images with no published labels.
“While our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general…On recognizing elementary object categories…machines still have obvious errors in cases that are trivial for humans. Nevertheless, we believe our results show the tremendous potential of machine algorithms to match human-level performance for many visual recognition tasks,” the paper’s authors clarify.
The paper has been co-written by Kaiming He, a researcher in Microsoft Research Asia’s Visual Computing Group,along with academic interns, Xiangyu Zhang and Shaoqing Ren, and principal researcher, Jian Sun of Microsoft Research.
(Image credit: Mike Mozart, via Flickr)