Claw-some AI Agent Testing
Percentage of tasks completed successfully across standardized OpenClaw agent tests
Scores are graded via automated checks and LLM judge. How we benchmarkยทView all tasks
moonshotai/kimi-k2.5anthropic/claude-opus-4.6qwen/qwen3.5-397b-a17bz-ai/glm-5x-ai/grok-4.1-fastminimax/minimax-m2.5anthropic/claude-sonnet-4.5qwen/qwen3.5-35b-a3bopenai/gpt-5.4qwen/qwen3.5-plus-02-15minimax/minimax-m2.1openai/gpt-5-minianthropic/claude-sonnet-4.6nvidia/nemotron-3-super-120b-a12b:freeanthropic/claude-haiku-4.5qwen/qwen3.5-27bstepfun/step-3.5-flashqwen/qwen3.5-122b-a10bgoogle/gemini-3.1-pro-previewqwen/qwen3-max-thinkingopenai/gpt-4o-miniz-ai/glm-4.5-airmistralai/devstral-2512deepseek/deepseek-v3.2anthropic/claude-opus-4.5google/gemini-3-flash-previewopenai/gpt-4ogoogle/gemini-2.5-progoogle/gemini-2.5-flasharcee-ai/trinity-large-preview:freeopenai/gpt-5-nanomistralai/mistral-large-2512openai/gpt-oss-20bdeepseek/deepseek-chat