The rise of embedded ML is transforming how devices interact with the world, pushing the boundaries of what’s possible with limited resources. These applications, from smart wearables to industrial sensors, demand a delicate balance between performance, power consumption, and privacy.
Vladislav Agafonov, a machine learning expert at Meta Reality Labs UK (formerly Oculus VR), intimately understands these challenges.
“Embedded Machine Learning is both fascinating and challenging because we’re running deep learning models on devices with very limited memory and processor power,” Agafonov said.
One of the most persistent challenges, according to Agafonov, is optimizing models for devices with constrained computational power and memory.
“The most persistent challenge is balancing model accuracy with limited on-chip memory and constrained processing power,” Agafonov said.
To address this, techniques like quantization and pruning are crucial. Quantization reduces the number of bits used to store model weights, often from 32 bits to 8 or fewer, significantly cutting memory usage. Pruning, on the other hand, removes unnecessary connections in the network, shrinking the model’s size and speeding up inference.
“I also pay attention to operation fusion, which means merging multiple steps in the computation to avoid storing big intermediate results in memory,” Agafonov said. “Similarly, using Direct Memory Access (DMA) can let sensor data flow straight into the computation engine without extra copies, helping reduce latency.”
By meticulously profiling each step, measuring cycles, memory footprint, and power consumption, engineers can optimize where it matters most, fitting sophisticated models into just a few hundred kilobytes of memory.
Hardware Acceleration and Software Optimization
Hardware acceleration is another critical component of embedded ML. Specialized chips like Neural Processing Units (NPUs) and Tensor Processing Units (TPUs) handle parallel processing, drastically speeding up neural network inference while minimizing power usage.
“Hardware acceleration is absolutely key to running sophisticated ML models on embedded devices,” Agafonov said. “But as these chips evolve, software optimization remains just as important.”
Frameworks like ExecuTorch aim to simplify the development process by handling low-level details, such as mapping workloads to different accelerators and managing memory efficiently.
“Instead of spending hours trying to hand-optimize every part of your code for each new chip, you can rely on the framework to do the heavy lifting,” Agafonov said.
This allows developers to focus on the machine learning models themselves, rather than the intricacies of hardware optimization.
Privacy and Federated Learning
Privacy is a growing concern, and embedded ML offers the advantage of local data processing.
“One of the big reasons embedded ML is so valuable is that data can be processed right on the device, which reduces or even eliminates the need to send sensitive information over a network,” Agafonov said.
Federated learning takes this concept further, allowing devices to train models locally and share only aggregated updates with a central server.
“Instead of gathering everyone’s data in a central database, each device trains the model independently using its own local information,” Agafonov said. “Then, it only sends back an ‘update’ or a summary of what it learned – not the raw data itself.”
This approach enhances privacy by preventing the transmission of raw user data, particularly important in sensitive applications like health and personal wearables.
The Rise of TinyML
TinyML, the application of machine learning on extremely resource-constrained devices like microcontrollers, is gaining momentum.
“Think of a small chip with only a few hundred kilobytes of memory that still needs to handle tasks like classification or detection without draining a battery in the process,” Agafonov said.
Applications like environmental monitoring and industrial predictive maintenance are prime examples.
“Small, battery-powered sensors can detect specific animal sounds or changes in air quality, then transmit meaningful alerts without wasting power on constant data streaming,” Agafonov said. “In industry, microcontrollers can detect early signs of machinery failure by monitoring vibrations or temperature spikes, helping prevent costly breakdowns.”
The growth of TinyML is driven by advancements in hardware and software. Microcontrollers now include specialized processing blocks, and lightweight ML frameworks simplify model optimization and deployment.
Immersive Experiences and Future Trends
At Meta Reality Labs, embedded ML is being used to enhance immersive experiences.
“We’re leveraging embedded ML to make immersive experiences more natural and responsive – think quick gesture recognition on a wristband that lets you control AR or VR interfaces without bulky controllers,” Agafonov said.
However, technical issues remain. “One significant hurdle is balancing power consumption with the need for near-instant inference,” Agafonov said. “Another is ensuring the models remain accurate under any conditions.”
Looking ahead, Agafonov sees several key trends shaping the future of embedded ML. The growing adoption of TinyML and ML-enabled microcontrollers, the expansion of hardware acceleration with specialized ML chips, and the increasing use of federated learning for privacy-preserving data processing are all poised to drive innovation in this field.
As embedded ML continues to evolve, the ability to balance power, privacy, and performance will be crucial in unlocking its full potential.