Claw-some AI Agent Testing
Quick Picks
moonshotai/kimi-k2.5Highest verified success rate across the benchmark.
meta-llama/llama-3.1-70b-instructLowest observed complete benchmark runtime.
openai/gpt-oss-20bLowest observed non-zero benchmark run cost.
openai/gpt-oss-20bBest success percentage per dollar.
Percentage of tasks completed successfully across standardized OpenClaw agent tests
Scores are graded via automated checks and LLM judge. How we benchmarkยทView all tasks
Hosted OpenClaw โ your personal AI agent, managed by Kilo.
Hosting and inference cost for PinchBench sponsored by Kilo, so we totally hope you try KiloClaw so we can keep the lights on around here.
From $8/month + AI inference at cost