MIT researchers claim they’ve developed a new method to teach robots new skills, which may help them perform manual labor objectives more effectively.
A warehouse robot picks mugs off a shelf and places them in boxes for shipment when e-commerce orders pour in. Everything is going smoothly until the warehouse processes a change, requiring the robot to grasp taller, narrower mugs that are stored upside down.
The NDF model makes it possible to teach robots new skills
It’s not only time-consuming and laborious to teach robots new skills but performing the task might also be dangerous. The new mugs must be hand-labeled by humans to teach the robot how to grasp them correctly, after which the procedure must be repeated.
However, a new algorithm created by MIT researchers may be completed in as little as 10 to 15 minutes with only a small number of human demonstrations. This machine-learning approach allows a robot to pick up and place objects that are in unique postures it has never seen before. Within ten to fifteen minutes, the robot would be ready to execute a brand new pick-and-place task.
A neural network, which was created specifically to rebuild three-dimensional forms, is used in the method in order to teach robots new skills. The technique uses a neural network that has been carefully developed to comprehend 3D forms. The system employs what the neural network has learned about 3D geometry to handle new things that are comparable to those seen in the examples after just a few demonstrations.
The researchers demonstrated that their invention can swiftly and successfully grasp never-before-seen mugs, bowls, and bottles in random postures using only 10 demonstrations to teach robots new skills.
“Our major contribution is the general ability to much more efficiently provide new skills to robots that need to operate in more unstructured environments where there could be a lot of variability. The concept of generalization by construction is a fascinating capability because this problem is typically so much harder,” explained a graduate student in electrical engineering and computer science (EECS), Anthony Simeonov. He is also the co-lead author of the paper.
Simeonov wrote the paper with co-lead author Yilun Du, an EECS graduate student; Andrea Tagliasacchi, a staff research scientist at Google Brain; Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Alberto Rodriguez, the Class of 1957 Associate Professor in the Department of Mechanical Engineering; and senior authors Pulkit Agrawal, a professor in CSAIL, and Vincent Sitzmann, an incoming assistant professor in EECS. The presentation of the findings will take place at the International Conference on Robotics and Automation.
Machine learning is enhanced with a neural network model
A machine-learning program may be trained to pick up a particular object, but if that thing is positioned on its side, the robot interprets this as a distinct situation. It’s one of the reasons why machine-learning systems have difficulties generalizing to new object orientations.
To address this problem, the researchers developed a new sort of neural network model, a Neural Descriptor Field (NDF), that learns the 3D form of a category of objects. The model analyzes the geometric representation for a certain item by computing a 3D point cloud, which is a set of data points or triplet coordinates in three dimensions.
The data points may be acquired from a depth camera that measures the distance between an item and a viewpoint. The network was trained on a huge dataset of simulated 3D objects in simulation, but it can now be applied to real-world items. This improvement makes it possible to teach robots new skills.
The NDF’s property of equivariance was used by the team. When shown a photo of an upright cup and then a photo of the same cup on its side, the model understands that the second mug is the identical item, only rotated.
“This equivariance is what allows us to much more effectively handle cases where the object you observe is in some arbitrary orientation,” Simeonov explains.
The NDF learns to link related components of similar objects as it improves its ability to re-create shapes. It discovers that the handles of mugs are comparable, regardless of whether some mugs are taller or shorter than others, or have smaller or longer handles.
“If you wanted to do this with another approach, you’d have to hand-label all the parts. Instead, our approach automatically discovers these parts from the shape reconstruction,” Du says.
The researchers employ this trained NDF model to teach robots new skills using just a few physical instances. They place the hand of the robot on the edge of a bowl or mug, for example, and record the fingertips’ positions.
Because the NDF has learned so much about 3D geometry and how to reconstruct shapes, it can infer the structure of a new shape, which enables the system to transfer the demonstrations to new objects in arbitrary poses,” Du says.
How successful is the NDF model?
In order to test their hypothesis, the researchers developed a proof-of-concept proving that their method works for real-world applications. They tested their model in simulations and on a live robotic arm using mugs, bowls, and bottles as objects. On pick-and-place activities with new objects in new orientations, the best baseline was only able to achieve a success rate of 45%. Grasping a new item and putting it in a target location, such as hanging mugs on a rack, is referred to as a success.
Many of these techniques require 3D geometric data, whereas 2D image information is used in most baselines. This makes it more difficult for these methods to combine equivariance. One reason why the NDF approach was so successful is that it was based on a different principle, this aspect made it possible for them to teach robots new skills.
While the researchers were satisfied with its results, their method is only effective for the class of things on which it has been trained. A robot taught to pick up mugs would not be able to grab boxes of headphones since they have geometric features that are too different from the network’s training data.
“In the future, scaling it up to many categories or completely letting go of the notion of the category altogether would be ideal,” Simeonov explains.
They want to apply the technology to nonrigid objects and, in the long run, make it capable of picking and placing items when the target region changes. This way they will be able to teach robots new skills.
This research was funded in part by the Defense Advanced Research Projects Agency, Singapore Defense Science and Technology Agency, and National Science Foundation.