Tuesday, April 29, 2025
HomeNewsOperai starts the operator, an AI agent who can do tasks on...

Operai starts the operator, an AI agent who can do tasks on the web

Operai starts the operator, an AI agent who can do tasks on the web

On Thursday, Openai published a previous view of investigation of “Operator“A web automation tool that uses a new call model Computer use agent (Cua) to control a web browser through a visual interface. The system performs tasks when seeing and interacting with elements on screen as buttons and text fields similar to how a human would do.

The operator is available today for subscribers of the Chatgpt Pro plan of $ 200 per month in Operator.chatgpt.com. The company plans to expand to users of Plus, Team and Enterprise later. Openai intends to integrate these capabilities directly in Chatgpt and then launch Chau through its API for developers.

The operator observes screen content in its virtual environment while using an internal browser and executes tasks through simulated keyboard and mouse inputs. The computer use agent processes screenshots of your browser interface to understand the navigator status and then make decisions about clicking, writing and moving based on your observations.

The launch of OpenAI follows other technological companies as they advance in what are often called artificial intelligence systems “agent”, which can take measures in the name of a user. Google announced Sailor Project In December 2024, which performs automated tasks through the Chrome browser, and two months before, in October 2024, Anthrope launched an web automation tool called “computer use” He focused on developers who can control a user’s mouse cursor and take measures on a computer.

“The operator interface is very similar to the demonstration of use of the computer Claude of Anthrope from October.” wrote The Simon Willison’s researcher in his blog, “even to the interface with a chat panel to the left and a visible interface interacted to the right.”

An operator demonstration video created by OpenAI.

Look and take action

To use a browser as it would, the computer use agent works in multiple steps. First, capture screenshots to monitor your progress, then analyze those images (using GPT-4o vision capabilities with additional reinforcement learning) to process unprocessed pixels data. Next, determine what actions to take and then perform virtual tickets to control the browser. According to reports, this iterative loop design allows the system to recover errors and handle complex tasks in different applications.

Source

Author

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular