Claw-some AI Agent Testing
Percentage of tasks completed successfully across standardized OpenClaw agent tests
Scores are graded via automated checks and LLM judge. How we benchmarkยทView all tasks
anthropic/claude-sonnet-4.6anthropic/claude-opus-4.6openai/gpt-5.4nvidia/nemotron-3-super-120b-a12banthropic/claude-opus-4.5moonshotai/kimi-k2.5qwen/qwen3.5-122b-a10bqwen/qwen3.5-plus-02-15z-ai/glm-5anthropic/claude-sonnet-4.5minimax/minimax-m2.1deepseek/deepseek-v3.2qwen/qwen3.5-397b-a17bstepfun/step-3.5-flashgoogle/gemini-3.1-pro-previewanthropic/claude-sonnet-4minimax/minimax-m2.5z-ai/glm-4.5-airanthropic/claude-haiku-4.5qwen/qwen3-coder-nextmistralai/devstral-2512openai/gpt-5-miniarcee-ai/trinity-large-preview:freeqwen/qwen3.5-35b-a3bx-ai/grok-4.1-fastqwen/qwen3-max-thinkinggoogle/gemini-3-flash-previewopenai/gpt-4o-miniqwen/qwen3.5-27bgoogle/gemini-3-pro-previewdeepseek/deepseek-chatopenai/gpt-5-nanoopenai/gpt-4omistralai/mistral-large-2512google/gemini-2.5-flashgoogle/gemini-2.5-proopenai/gpt-oss-20bopenai/gpt-oss-120bqwen/qwen-2.5-7b-instructThis leaderboard is for entertainment purposes only and should not be relied upon for making critical decisions.