About 50 results
Open links in new tab
  1. SWE-bench Leaderboards

    Aug 3, 2025 · SWE-bench Bash Only uses the SWE-bench Verified dataset with the mini-SWE-agent environment for all models [Post]. SWE-bench Lite is a subset curated for less costly evaluation …

  2. Overview - SWE-bench

    You can find the full leaderboard at swebench.com! 📋 Overview SWE-bench provides: Real-world GitHub issues - Evaluate LLMs on actual software engineering tasks Reproducible evaluation - Docker …

  3. SWE-bench

    SWE-bench tests AI systems' ability to solve GitHub issues. We collect 2,294 task instances by crawling Pull Requests and Issues from 12 popular Python repositories. Each instance is based on a pull …

  4. SWE-bench Bash Only

    Aug 3, 2025 · SWE-bench Bash Only uses the SWE-bench Verified dataset with the mini-SWE-agent environment for all models [Post]. SWE-bench Lite is a subset curated for less costly evaluation …

  5. FAQ - SWE-bench

    You can also set --cache_level=env and --clean=True when running swebench.harness.run_evaluation to only dynamically remove instance images after they are used.

  6. SWE-bench Multilingual

    Originally posted as a blog post on Kabir's website. Summary This post introduces SWE-bench Multilingual, a new benchmark in the SWE-bench family designed to evaluate the software …

  7. SWE-bench Results Viewer

    Select the split & model below to get automated analyses of the model's performance on the SWE-bench split.

  8. SWE-bench Multimodal

    Citation If you use SWE-bench Multimodal in your research, please cite our paper:

  9. Installation - SWE-bench

    This will install the package in development mode, allowing you to make changes to the code if needed. Install dependencies for dataset generation or RAG inference To install the dependencies for dataset …

  10. SWE-bench Lite

    Removed instances that create or remove files Removed instances that contain tests with error message checks Finally, sampled 300 test instances and 23 development instances from the …