datasets
Organization Account
datasets's Repositories
Displaying Page 15 of 15 (150 total Repositories)
mbpp
Public
Updated: 5 months ago

The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases. As described in the paper, a subset of the data has been hand-verified by us.

GSM8K
Public
Updated: 5 months ago

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

BoolQ
Public
Updated: 5 months ago

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.

Updated: 5 months ago

CommonsenseQA is a new multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers . It contains 12,102 questions with one correct answer and four distractor answers. The dataset is provided in two major training/validation/testing set splits: "Random split" which is the main evaluation split, and "Question token split", see paper for details.

Updated: 5 months ago

OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension.

SIQA
Public
Updated: 5 months ago

Social Interaction QA (SIQA) is a question-answering benchmark for testing social commonsense intelligence. Contrary to many prior benchmarks that focus on physical or taxonomic knowledge, Social IQa focuses on reasoning about people’s actions and their social implications. For example, given an action like "Jesse saw a concert" and a question like "Why did Jesse do this?", humans can easily infer that Jesse wanted "to see their favorite performer" or "to enjoy the music", and not "to see what's happening inside" or "to see if it works". The actions in Social IQa span a wide variety of social situations, and answer candidates contain both human-curated answers and adversarially-filtered machine-generated candidates. Social IQa contains over 37,000 QA pairs for evaluating models’ abilities to reason about the social implications of everyday events and situations.

Updated: 5 months ago

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Challenging Set of questions.

ARC-Easy
Public
Updated: 5 months ago

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Easy Set.

food101
Public
Updated: 5 months ago

Food 101 Dataset

5.1 gb
101K23
60
Updated: 1 year ago

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features:

20.6 gb
2123K1
00