In a discreet move, Apple’s research team has published a document shedding light on the company’s advancements in MM1, a suite of advanced multimodal large language models. These models are designed for a variety of applications including natural language inference, image captioning, and visual question answering. This revelation indicates that Apple, traditionally reticent about its AI ventures while its competitors hailed AI as the future of technology, is not only catching up but is also poised to set the pace in the industry.
What’s the extent of Apple MM1?
“In this work, we discuss building performant Multimodal Large Language Models (MLLMs). We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art few-shot results across multiple benchmarks, compared to other published pre-training results,” the document reads.
The document elaborates on MM1, characterizing it as a robust ensemble of multimodal models, capable of supporting up to 30 billion parameters. It highlights their exceptional performance, fine-tuned across a spectrum of multimodal benchmarks, positioning these models at the forefront of foundational AI technologies. According to the team at Apple, multimodal large language models (MLLMs) represent a significant evolution beyond traditional LLMs, boasting enhanced capabilities.
The researchers at Apple are convinced they have achieved a significant milestone in training models to interpret both images and text. They anticipate that their insights will significantly aid the community in developing models that can handle increasingly large datasets more efficiently and reliably. However, despite the promising insights shared in the paper, the practical application and testing of Apple MM1 remain on the horizon, with access to the model itself yet to be opened up for external evaluation.
The future of Apple’s venture into large language models, particularly MM1, hangs in a balance, with speculation around the company’s development of an LLM framework internally dubbed “Ajax,” part of an ambitious $1 billion investment in AI research and development. Adding fuel to this fire, rumors have swirled about Apple’s acquisition of the startup DarwinAI earlier this year, a move purportedly aimed at bolstering these efforts.
Apple’s CEO, Tim Cook, broke the company’s year-long silence on its AI ambitions during a post-earnings call in February, stating:
“We view AI and machine learning as fundamental technologies, and they’re integral to virtually every product that we ship. We’re excited to share the details of our ongoing work in that space later this year.”
More so, Apple recently showcased the AI capabilities of its new MacBook Air M3, hinting at the significant role AI will play in its future offerings. In a strategic pivot, the company chose to disband Project Titan last month, redirecting its focus towards burgeoning areas like artificial intelligence, signaling a recalibration of its innovation priorities.
Featured image credit: Kerem Gülen/Midjourney