MultiPL-E

MultiPL-E

A multi-programming language benchmark for LLMs

Stars: 184

Visit
 screenshot

MultiPL-E is a system for translating unit test-driven neural code generation benchmarks to new languages. It is part of the BigCode Code Generation LM Harness and allows for evaluating Code LLMs using various benchmarks. The tool supports multiple versions with improvements and new language additions, providing a scalable and polyglot approach to benchmarking neural code generation. Users can access a tutorial for direct usage and explore the dataset of translated prompts on the Hugging Face Hub.

README:

Multi-Programming Language Evaluation of Large Language Models of Code (MultiPL-E)

MultiPL-E is a system for translating unit test-driven neural code generation benchmarks to new languages. We have used MultiPL-E to translate two popular Python benchmarks (HumanEval and MBPP) to 18 other programming languages.

For more information:

Versions

  • Version 3.0

    • We are going to maintain the changelog on the dataset page: https://huggingface.co/datasets/nuprl/MultiPL-E
    • The dataset was versioned at 3.0, and we are bumping the software version to stay in sync.
    • We have published several new PLs in the dataset. However, we have not included these PLs at this time: Dafny, Coq, Lean, Luau, and MATLAB.
  • Version 0.5.0: Instruction-following support and new languages

    • New languages: Luau, Elixir, Lean, Coq, Dafny
    • Support for instruction-following prompts
    • vLLM support for faster evaluation
  • Version 0.4.0: QoL improvements and new languages

    • New languages: OCaml, MATLAB
    • Using .jsonl instead of .json for prompts
    • Several bugfixes to prompts
  • Version 0.3.0: used to evaluate StarCoder

    • This version corrects several bugs in prompts and test cases that resulted in lower pass@k rates for some of the statically typed languages. The most significant difference is that the pass@k for Java increases by about 2% on HumanEval.
  • Version 0.2.0: used to evaluate SantaCoder

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for MultiPL-E

Similar Open Source Tools

For similar tasks

For similar jobs