Microsoft Announces Magma Foundation Model That Can Complete Multimodal Agentic Tasks

February 22, 2025

2

Microsoft researchers announced a new foundation model on Wednesday that can perform agentic functions. Dubbed Magma, the artificial intelligence (AI) model is pre-trained on a large volume of datasets across text, images, videos, as well as spatial formats. The Redmond-based tech giant said that Magma is an extension of vision-language (VL) models and it can not only understand multimodal information but can also plan and act on them. The AI agent-enabled model can be used in a wide range of tasks including computer vision, user interface (UI) navigation, and robot manipulation.

Microsoft Announces Magma Foundation Model

In a GitHub postMicrosoft researchers detailed the new Magma foundation model. Foundation models are distinctive large language models (LLMs), which are built from scratch and are not distilled from any other model. They often become the baseline for other models in the series. Magma is unique in the sense that the AI model is pre-trained on a wide range of datasets.

The researchers stated that the base architecture behind Magma is the Llama 3 AI model. However, Magma is also equipped with the ability to plan and act in the visual-spatial world. This allows the model to not only generate outputs like a chatbot but also execute actions.

It can be used as a computer vision chatbot that can offer information about the world it views when paired with camera sensors. Magma can also be used to control the UI of a device. But more interestingly, it can also control robots to complete complex tasks using agentic capabilities.

The researchers said a major reason behind these capabilities is the diverse dataset along with two technical components — Set-of-Mark and Trace-of-Mark. The former enables action grounding in images, videos and spatial data by having the model predict numeric marks for buttons or robot arms in image space. The latter feeds the model temporal video dynamics and makes it predict the next frames before it takes action. This allows the model to develop a strong spatial understanding.

Microsoft researchers also shared the benchmark scores of the AI model based on internal testing. It has achieved competitive scores across all the agentic evaluation tests, outperforming models by OpenAI, Alibaba, and Google. The company has not released Magma in the public domain as of now.

Source

Microsoft Announces Magma Foundation Model That Can Complete Multimodal Agentic Tasks

Microsoft Announces Magma Foundation Model

Discover Last Week’s Best Dividend Stocks and the Surprising Risers

44TB HDD in 2026? Western Digital roadmap shows how HAMR will supercharge the capacities of hard drives in the near future

Vivo X200 Ultra Tipped to Feature iPhone Style Action Button, MediaTek Dimensity 9400+ SoC, More

LEAVE A REPLY Cancel reply

Most Popular

Dabba Cartel Ott Release Date: Shabana Azmi Starrer to transmit online soon

Kaushaljis vs Kaushal: Ashutosh Rana & Sheeba Chaddha Star in 2025 Family Drama

Tech Titans Prepare to Dominate the AI Race with Unprecedented $320 Billion Investment!

Asteroid 2024 YR4 Poses 1-in-43 Chance of Earth Impact in 2032

Is it time to sell XRP? Whales transfer 70 million tokens to exchanges

Haywire now conveying in the Lionsgate Play: Cast, Details of the plot and more