undefined (datasets) Repositories

Organization Account

Repositories

datasets's Repositories

Displaying Page 15 of 18 (179 total Repositories)

cats_vs_dogs

Public

Computer Vision Image Classification

578.3 mb

23K12

Updated: 11 months ago

MBTA-Bus-Arrival-Departure-Times-2022

Public

This dataset contains the arrival and departure events for buses up to the most recent completed month of 2022. Due to data collection issues, data is not guaranteed to be complete for any stop or date.

3.8 gb

Updated: 1 year ago

BabyLM_2024

Public

BabyLM Challenge 2024 - Sample efficient pretraining on a developmentally plausible corpus.

418.7 mb

242

Updated: 1 year ago

babi_qa

Public

The QA bAbI tasks are a set of proxy tasks that evaluate reading comprehension via question answering.

2.8 mb

Updated: 1 year ago

bookcorpus

Public

Natural Language Processing Language Modeling

3 gb

Updated: 1 year ago

universal_dependencies

Public

Natural Language Processing

898.7 kB

Updated: 1 year ago

idl-wds

Public

3.2 gb

1107K7K

Updated: 1 year ago

pdfa-eng-words

Public

3.9 gb

11107.1K

Updated: 1 year ago

arxiv_papers

Public

A dataset of Arxiv Papers to build on top of for fine tuning an LLM

35.7 gb

122K23K

Updated: 1 year ago

Pexels

Public

13.5 gb

1K19972

Updated: 1 year ago