DeepScholar-Bench Leaderboard

🌐🔍 About DeepScholar-Bench

DeepScholar-Bench provides a live benchmark for evaluating generative research synthesis systems. Its benchmark dataset is generated based on recent ArXiv papers, requiring systems to generate a related work sections by retrieving, synthesizing, and citing sources from the web. The benchmark provides holistic evaluation across three critical capabilities of generative research synthesis: knowledge synthesis, retrieval quality and verifiability.

System Name ↕	🧠 Knowledge Synthesis		🔍 Retrieval Quality			✅ Verifiability
System Name ↕	Organization ↕	Nugget Coverage ↕	Relevance Rate ↕	Document Importance ↕	Reference Coverage ↕	Citation Precision ↕	Claim Coverage ↕
OpenAI DeepResearch Unknown o3	0.857	0.392	0.629	0.176	0.228	0.399	0.138
Search AI (o3) Unknown o3	0.849	0.348	0.610	0.036	0.217	0.425	0.495
Search AI (Gemini-2.5-pro) Unknown Gemini-2.5-pro	0.706	0.277	0.583	0.014	0.091	0.415	0.398
Search AI (Claude-opus-4) Unknown Claude-opus-4	0.698	0.307	0.583	0.012	0.173	0.701	0.760
Search AI (GPT-4.1) Unknown GPT4.1	0.556	0.265	0.490	0.013	0.068	0.498	0.470
Search AI (Llama-4-Scout) Unknown Llama-4-scout	0.151	0.193	0.445	0.013	0.067	0.316	0.368

📊 Interactive Radar Charts

Check/uncheck systems to compare their performance across all metrics:

📊 Evaluation Metrics

🧠 Knowledge Synthesis

Organization - Measures how well the system organizes and structures the related work section
Nugget Coverage - Evaluates the comprehensiveness of key insights and findings covered

🔍 Retrieval Quality

Relevance Rate - Assesses how relevant the retrieved references are to the query
Document Importance - Measures the significance and impact of cited documents
Reference Coverage - Evaluates the breadth of reference sources included

✅ Verifiability

Citation Precision - Measures the accuracy and correctness of citations
Claim Coverage - Evaluates how well claims are supported by evidence

📬 Submit Your Solution

If you'd like to submit your solution to the DeepScholar-Bench leaderboard, please contact us:

📧 Email negara@berkeley.edu

Click the button above to open your email client with a pre-filled message template.