MIT researchers developed a new speech-to-reality system that combines 3D generative AI with robotic assembly to fabricate objects on demand. The system created items like furniture in as little as five minutes.
This AI-driven workflow allows users to provide spoken input to a robotic arm, effectively “speaking objects into existence.” The technology leverages natural language processing, 3D generative AI, and robotic assembly to streamline the manufacturing process.
Alexander Htet Kyaw, an MIT graduate student and Morningside Academy for Design (MAD) fellow, stated, “We’re connecting natural language processing, 3D generative AI, and robotic assembly.” He added that these rapidly advancing research areas have not previously been combined to create physical objects from a simple speech prompt.
The system receives spoken commands, such as “I want a simple stool,” and then constructs objects from modular components. So far, the researchers have used the system to build stools, shelves, chairs, a small table, and decorative forms including a dog statue.
The speech-to-reality system processes user requests through several stages:
- Speech recognition: A large language model processes the user’s spoken input.
- 3D generative AI: The system creates a digital mesh representation of the desired object.
- Voxelization algorithm: The 3D mesh is broken down into specific assembly components.
- Geometric processing: The AI-generated assembly is modified to account for real-world fabrication constraints, such as component count, overhangs, and geometric connectivity.
- Assembly sequence and path planning: The system creates a feasible assembly sequence and automated path planning for the robotic arm.
Unlike 3D printing, which often takes hours or days, this system completes object construction within minutes. It also makes design and manufacturing more accessible to individuals without expertise in 3D modeling or robotic programming.
Kyaw developed the initial system while taking Professor Neil Gershenfeld’s course, “How to Make Almost Anything.” He continued the project at the MIT Center for Bits and Atoms (CBA), collaborating with graduate students Se Hwan Jeon of the Department of Mechanical Engineering and Miana Smith of CBA.
The team plans to improve the weight-bearing capacity of furniture by implementing more robust connections between modular cubes, moving beyond current magnetic connections. Smith noted, “We’ve also developed pipelines for converting voxel structures into feasible assembly sequences for small, distributed mobile robots, which could help translate this work to structures at any size scale.”
The use of modular components aims to reduce manufacturing waste by enabling disassembly and reassembly into new objects. Kyaw is also working to integrate gesture recognition and augmented reality into the system, combining both speech and gestural control for enhanced interaction.
The team presented their paper, “Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly,” at the Association for Computing Machinery (ACM) Symposium on Computational Fabrication (SCF ’25) at MIT on November 21.





