Collections/ox/llm-eval

LLM-Eval

A list of standard benchmarks for LLM evaluation

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Challenging Set of questions.

Measuring Massive Multitask Language Understanding | ICLR 2021

10 mb
53
Updated: 8 months ago
0

8.6 mb
46
Updated: 4 months ago

Empty
2
Updated: 5 months ago

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Easy Set.

90.4 kB
21
Updated: 5 months ago