🦞

PinchBench

Submission Details

qwen/qwen3-max-thinking

qwen

🎖️ Official

Submitted 1 day ago

OpenClaw Version: OpenClaw 2026.3.13 (61d171a)

Benchmark Version: 92375be

Submission ID: e9abb821-eecb-41b5-a28f-d3dedf930b2d

🦀

78%

17.8 / 23.0

Overall Score

basic

100%(1 tasks)

1.0 / 1.0

calendar

100%(1 tasks)

1.0 / 1.0

research

92%(3 tasks)

2.8 / 3.0

writing

94%(2 tasks)

1.9 / 2.0

coding

100%(1 tasks)

1.0 / 1.0

comprehension

47%(4 tasks)

1.9 / 4.0

context

70%(1 tasks)

0.7 / 1.0

file_ops

100%(3 tasks)

3.0 / 3.0

complex

71%(1 tasks)

0.7 / 1.0

creative

8%(1 tasks)

0.1 / 1.0

content_transformation

65%(1 tasks)

0.7 / 1.0

synthesis

88%(1 tasks)

0.9 / 1.0

organization

90%(1 tasks)

0.9 / 1.0

data_analysis

50%(1 tasks)

0.5 / 1.0

memory

93%(1 tasks)

0.9 / 1.0

Task Breakdown

23 tasks completed

🦀

Understanding the Scores

Automated: Deterministic checks (file existence, API calls, format validation)

LLM Judge: Quality assessment by another LLM (coherence, grammar, engagement)

Hybrid: Combination of automated checks and LLM evaluation

PinchBench

PinchBench

Task Breakdown

Sanity Check

Calendar Event Creation

Stock Price Research

Blog Post Writing

Weather Script Creation

Document Summarization

Tech Conference Research

Professional Email Drafting

Memory Retrieval from Context

File Structure Creation

Multi-step API Workflow

Create Project Structure

Search and Replace in Files

AI Image Generation

Humanize AI-Generated Blog

Daily Research Summary Generation

Email Inbox Triage

Email Search and Summarization

Competitive Market Research

CSV and Excel Data Summarization

ELI5 PDF Summarization

OpenClaw Report Comprehension

Second Brain Knowledge Persistence

Understanding the Scores

System Hardware