Anthropic has recently unveiled a groundbreaking feature in its Claude 3.5 Sonnet AI model that allows the AI to operate a computer autonomously. This innovative capability, named “computer use,” is currently available in public beta for developers. With this feature, Claude can interact with a computer by navigating the screen, moving the cursor, clicking buttons, and typing text—essentially mimicking human computer operation.
The introduction of this feature places Claude in competition with similar technologies from major players in the AI field. Microsoft’s Copilot Vision, OpenAI’s desktop application for ChatGPT, and Google’s Gemini app on Android have all demonstrated the ability to interpret computer screens. However, these tools have not yet advanced to the point of widespread functionality that allows them to perform tasks independently on users’ computers. Notably, while Rabbit has promised similar features for its R1 model, it has yet to deliver.
Anthropic has emphasized that the “computer use” feature is still in its experimental phase, acknowledging that it can be “cumbersome and error-prone.” The company aims to gather feedback from developers during this beta period, with the expectation that the feature will see significant improvements over time.
Developers have noted some limitations with the current capabilities of Claude. For instance, while it can manage basic tasks, it struggles with more complex actions like dragging and zooming. Additionally, Claude’s method of perceiving the screen involves taking screenshots and assembling them, which means it can overlook fleeting actions or notifications that occur too quickly.
Importantly, Anthropic has designed Claude with precautions to prevent it from engaging with social media or participating in sensitive activities. The AI has been programmed with safeguards to avoid actions related to election content, as well as restrictions against generating or posting social media content, registering web domains, or interacting with government websites.
In conjunction with the new “computer use” feature, Anthropic has reported that the Claude 3.5 Sonnet model has demonstrated substantial improvements across various benchmarks. Notably, its performance in agentic coding and tool use tasks has seen significant enhancements. For example, its accuracy on the SWE-bench Verified coding benchmark increased from 33.4% to 49.0%, outperforming all other publicly available models, including those designed specifically for coding tasks. Similarly, its scores on the TAU-bench—an agentic tool use assessment—improved from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the more challenging airline sector.
Overall, Anthropic’s updates signal a notable advancement in AI capabilities, particularly in how these models can interact with and control computing environments. As the technology continues to evolve, it holds promise for making human-computer interaction more seamless and efficient.