Meta Drops SAM 3 And SAM 3D To Turn Text Prompts Into Precise Visual Edits

Meta Platforms announced SAM 3 and SAM 3D, new open-source computer-vision models in its Segment Anything Collection, which enable text-based object detection and three-dimensional reconstruction to support video-editing and content-creation processes.

These models differ from prior versions in the collection by allowing detection and segmentation of objects through natural language prompts rather than manual selection methods. For instance, SAM 3 identifies every occurrence of objects matching descriptions like “yellow school bus” or “people sitting down, but not wearing a red baseball cap,” as detailed in Meta’s announcement. This capability extends to complex queries that specify exclusions or conditions, facilitating precise targeting in visual content.

SAM 3D focuses on generating three-dimensional representations of objects, individuals, and entire scenes derived from individual two-dimensional images. Such reconstruction provides depth and spatial information that was previously unavailable from static photos, enabling applications that require volumetric understanding.

Performance metrics for SAM 3 include a zero-shot mask average precision score of 47.0 on the LVIS benchmark, which shows a 22 percent increase compared to prior systems, based on findings in Meta’s research paper. The model operates at a speed of about 30 milliseconds per frame when using H200 GPUs and manages over 100 objects at once, supporting real-time processing in demanding scenarios.

“SAM 3 overcomes this limitation, accepting a much larger range of text prompts,” Meta stated in its announcement. To assist developers, Meta has collaborated with Roboflow, providing tools for data annotation, model fine-tuning, and deployment tailored to particular use cases. This partnership streamlines customization for industries relying on computer vision.

Access to both models occurs through the Segment Anything Playground platform developed by Meta, designed for users without advanced technical skills. Meta is making available the model weights for SAM 3, along with evaluation benchmarks and the associated research papers. For SAM 3D, the company shares model checkpoints and inference code specifically with members of the research community, promoting further academic and experimental development.

In practical implementations, SAM 3 integrates into Meta’s Edits video-creation application and the Vibes platform, where it drives effects that permit modifications to designated objects in videos. Creators can thus edit elements selectively without affecting surrounding content. Separately, SAM 3D supports the “View in Room” functionality on Facebook Marketplace, permitting buyers to place virtual representations of furniture and home-decor items into their own environments for preview prior to acquisition.

Featured image credit