Microsoft develops the AI that most robots lack: the ability to make good decisions

The robotics is advancing rapidly, but most robots still face a fundamental limitation: the difficulty in making precise decisions about what action to take and where to carry it out. Microsoft, along with a consortium of academic researchers, has presented a new standard, GroundedPlanBench, which seeks to solve this challenge and bring the artificial intelligence of robots closer to efficient and contextualized decision-making. In conventional robotic systems, the decision-making process is divided into two stages. First, a vision and language model generates a plan in natural language. Then, another system translates that plan into physical actions. This fragmented approach causes frequent errors, as the disconnection between the plan and the execution allows mistakes in one stage to be carried over to the next. Typical errors include confusion about which object to manipulate or the invention of unnecessary steps. For example, if a robot is asked to discard paper cups, it may not correctly identify which cup to pick up or even perform unsolicited actions. These failures are aggravated in cluttered environments, where objects are similar or numerous.

We recommend reading:OpenAI will close Sora, its AI video generation platform

GroundedPlanBench: A New Standard for Improving Decision-Making

To address this challenge, Microsoft and its partners have developed GroundedPlanBench, a system that evaluates whether AI models can plan tasks while accurately identifying where each action should be performed.

Unlike traditional systems that only use text, this standard links each action to a specific location in an image. Actions such as grabbing, placing, opening, or closing are associated with specific objects or positions, forcing the AI to connect the decision with the real physical environment.

The benchmark includes more than a thousand tasks based on real robot interactions. Some instructions are direct, such as placing a spoon on a plate, while others are open-ended, such as tidying a table. This variety is crucial, as robots often fail when instructions are not clear enough. In one of the experiments, a robot had to place four napkins on a sofa. The lack of specificity in the instruction caused the system to repeat the action on the same napkin, even with seemingly more precise descriptions such as “upper left napkin”. This shows that ambiguous language continues to be an obstacle to the reliable execution of complex tasks.

Learning based on real tasks

To improve decision-making capabilities, the team developed a training method called Video-to-Spatially Grounded Planning (V2GP). This system analyzes videos of robots performing tasks, detects interactions with objects, identifies those objects, and tracks their locations, thus generating structured plans that link each action to a specific point.

Using this approach, researchers generated more than 40,000 "grounded" plans, ranging from simple actions to complex sequences of up to 26 steps. The models trained with this method demonstrated a better ability to choose appropriate actions and associate them with the correct objects, as well as reduce repetitive errors such as acting multiple times on the same element.

A Paradigm Shift for Robotics

Despite the advances, challenges persist, especially in long tasks and with indirect instructions. Researchers warn that models must be able to reason about extensive sequences and maintain coherence throughout multiple steps. When comparing the new approach with traditional systems, it was observed that the latter tend to assign multiple actions to the same object or place, especially when the orders are ambiguous. The integration of planning and localization into a single process reduces these mismatches and allows for more precise decisions. The Microsoft team suggests that future research could combine this method with predictive models capable of anticipating the consequences of each action, which would help robots avoid errors in real time. You can also read: The study's conclusions point to a clear direction for the future of robotics: systems that jointly consider action and location are more likely to operate successfully in real environments. This innovation represents a key step for robots to be able to decide and act reliably in everyday tasks, bringing them closer to a true applied artificial intelligence.

De Último Minuto

Nationals

Sports

Technology

International

Entertainment

Opinion

Economy

Style

Editorial

De Último Minuto

Microsoft develops the AI that most robots lack: the ability to make good decisions

We recommend reading:OpenAI will close Sora, its AI video generation platform

GroundedPlanBench: A New Standard for Improving Decision-Making

Learning based on real tasks

A Paradigm Shift for Robotics

In the spotlight

Explore more

CONTACT US

SECTIONS

KEEP IN TOUCH

DOWNLOAD OUR APP!