🦞

PinchBench

Submission Details

qwen/qwen3-max-thinking

qwen

🎖️ Official

🏅 #1264

Submitted 4 months ago

Try in KiloClaw

OpenClaw Version: OpenClaw 2026.3.8 (3caab92)

Benchmark Version: 8a5a7d5

Submission ID: 1ca208b3-899b-468a-9bf4-f2dcf9363dff

Rolling-window badges

qwen/qwen3-max-thinking does not currently hold a daily, weekly, or monthly winner badge for success, speed, cost, or value.

🦀

72%

16.5 / 23.0

Overall Score

🔧 Code & DevOps

100%(1 tasks)

1.0 / 1.0

📌 file ops

100%(1 tasks)

1.0 / 1.0

📊 Data & Analysis

98%(1 tasks)

1.0 / 1.0

🔍 Research & Knowledge

97%(4 tasks)

3.9 / 4.0

✍️ Writing & Content

91%(4 tasks)

3.6 / 4.0

🤖 Core Agent

55%(9 tasks)

5.0 / 9.0

Task Breakdown

23 tasks completed

🦀

Understanding the Scores

Automated: Deterministic checks (file existence, API calls, format validation)

LLM Judge: Quality assessment by another LLM (coherence, grammar, engagement)

Hybrid: Combination of automated checks and LLM evaluation

PinchBench

PinchBench

Rolling-window badges

Task Breakdown

Sanity Check

Calendar Event Creation

Stock Price Research

Blog Post Writing

Weather Script Creation

Document Summarization

Tech Conference Research

Professional Email Drafting

Memory Retrieval from Context

File Structure Creation

Multi-step API Workflow

Create Project Structure

Search and Replace in Files

AI Image Generation

Humanize AI-Generated Blog

Daily Research Summary Generation

Email Inbox Triage

Email Search and Summarization

Competitive Market Research

CSV and Excel Data Summarization

ELI5 PDF Summarization

OpenClaw Report Comprehension

Second Brain Knowledge Persistence

Understanding the Scores

System Hardware