Microsoft’s new AI agent can control software and robots

On Wednesday, Microsoft Evaluation launched Magmaan built-in AI foundation model that mixes seen and language processing to manage software program program interfaces and robotic strategies. If the outcomes preserve up exterior of Microsoft’s inside testing, it might mark a big step forward for an all-purpose multimodal AI which will perform interactively in every precise and digital areas.

Microsoft claims that Magma is the first AI model that not solely processes multimodal data (like textual content material, pictures, and video) nevertheless could natively act upon it—whether or not or not that’s navigating an individual interface or manipulating bodily objects. The problem is a collaboration between researchers at Microsoft, KAISTthe School of Maryland, the School of Wisconsin-Madison, and the School of Washington.

We have now seen completely different large language model-based robotics initiatives like Google’s PALM-E and RT-2 or Microsoft’s ChatGPT for Robotics that profit from LLMs for an interface. Nonetheless, not like many prior multimodal AI strategies that require separate fashions for notion and administration, Magma integrates these skills proper right into a single foundation model.

Microsoft’s new AI agent can control software and robots — A combined graphic that reveals off quite a few capabilities of the Magma model.

Credit score rating:

Microsoft Evaluation

Microsoft is positioning Magma as a step in direction of agentic AI, meaning a system which will autonomously craft plans and perform multi-step duties on a human’s behalf reasonably than merely answering questions on what it sees.

“Given a described goal,” Microsoft writes in its evaluation paper, “Magma is able to formulate plans and execute actions to realize it. By efficiently transferring information from freely on the market seen and language data, Magma bridges verbal, spatial, and temporal intelligence to navigate difficult duties and settings.”

Microsoft is simply not alone in its pursuit of agentic AI. OpenAI has been experimenting with AI brokers through initiatives like Operator which will perform UI duties in a web based browser, and Google has explored numerous agentic initiatives with Gemini 2.0.

Spatial intelligence

Whereas Magma builds off of Transformer-based LLM know-how that feeds teaching tokens proper right into a neural group, it’s utterly completely different from typical vision-language fashions (like GPT-4V, as an illustration) by going previous what they title “verbal intelligence” to moreover embrace “spatial intelligence” (planning and movement execution). By teaching on a mix of pictures, films, robotics data, and UI interactions, Microsoft claims that Magma is an actual multimodal agent reasonably than solely a perceptual model.

Microsoft’s new AI agent can control software and robots

Spatial intelligence

By admin

Leave a Reply Cancel reply

You Missed

Bird flu continues spread as Trump’s pandemic experts are MIA

Microsoft warns that the powerful XCSSET macOS malware is back with new tricks

Study finds AI-generated meme captions funnier than human ones on average

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition

Microsoft’s new AI agent can control software and robots

Spatial intelligence

By admin

Related Posts

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition

Meager 8GB of RAM forces Pixel 9a to run “extra extra small” Gemini AI

DeepSeek goes beyond “open weights” AI with plans for source code release

Leave a Reply Cancel reply

You Missed

Bird flu continues spread as Trump’s pandemic experts are MIA

Microsoft warns that the powerful XCSSET macOS malware is back with new tricks

Study finds AI-generated meme captions funnier than human ones on average

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition