Anthropic Unveils Claude 3.5 AI with Desktop Control Abilities

Anthropic has launched Claude 3.5 Sonnet with a new "Computer Use" API, allowing the AI to autonomously control desktop applications.
Despite its potential to automate tasks, the model is still experimental, with limitations in performing some actions like scrolling and zooming.

Anthropic has launched an upgraded version of its Claude 3.5 Sonnet model, featuring a revolutionary "Computer Use" API that allows the AI to autonomously control desktop applications.

Released on October 22, this feature enables Claude to imitate human interaction with a computer by controlling the mouse, clicking buttons, and typing text, performing tasks like web browsing, data transfers, and more.

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.

Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text. pic.twitter.com/ZlywNPVIJP
— Anthropic (@AnthropicAI) October 22, 2024

This new feature aligns with Anthropic's vision of creating AI capable of automating various office tasks.

“We trained Claude to see what’s happening on a screen and then use the software tools available to carry out tasks,” explained Anthropic.

“When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place.”

Developers can access this feature via Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI platform. However, the model is still experimental, with limitations like difficulty with scrolling and zooming. Anthropic has advised developers to begin with low-risk tasks as they refine the model.

In a competitive AI landscape, Anthropic’s 3.5 Sonnet competes with similar technologies from companies like OpenAI and Salesforce. Anthropic claims that their model is “stronger and more robust,” performing better in tasks like coding, while also self-correcting when it encounters obstacles. However, the AI remains imperfect. In tests involving basic tasks like modifying a flight reservation, it completed less than half successfully.

Despite these challenges, Anthropic remains optimistic. The company is taking precautions to prevent misuse, including not training Claude on users' screenshots and prompts, and retaining captured screenshots for 30 days to monitor activity. Addressing potential risks, Anthropic stated, "There are no foolproof methods... we will continuously evaluate and iterate on our safety measures.”

Edited by Harshajit Sarmah