PaperBench: Evaluating AI’s Ability to Replicate AI Research



We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.



Source link

Donation

Buy author a coffee

Exit mobile version