Explore Repository Collections

LLM-Eval

A list of standard benchmarks for LLM evaluation

datasets/ARC-Challenge

Public

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Challenging Set of questions.

Natural Language Processing Question Answering

859 kB

Updated: 1 year ago

datasets/mmlu

Public

Measuring Massive Multitask Language Understanding | ICLR 2021

10 mb

Updated: 1 year ago

openai/gsm8k

Public

Natural Language Processing

8.6 mb

Updated: 10 months ago

lighteval/MATH

Public

Empty

Updated: 10 months ago

datasets/ARC-Easy

Public

Natural Language Processing Question Answering

1.5 mb

Updated: 1 year ago

datasets/openai_humaneval

Public

Natural Language Processing

90.4 kB

Updated: 10 months ago