datasets
Organization Account
datasets's Repositories
Displaying Page 17 of 19 (181 total Repositories)
Public
6

Dataset containing RGB images paired with depthmaps.

7.6 gb
83K12
Updated: 10 months ago

1.8 mb
23
Updated: 10 months ago
Public
1

This is a cleaned version of the original Alpaca Dataset released by Stanford.

42.7 mb
11
Updated: 11 months ago

This is a cleaned version of the HuggingFaceH4/ultrafeedback_binarized dataset that just has the chosen and rejected samples.

204.5 mb
11
Updated: 11 months ago

Short abstracts from Wikipedia pages

806.6 mb
1
Updated: 11 months ago

Question, context, answer triples that are marked as having the answer in context, not having the answer in context, and being a question that does not make sense to ask.

310 mb
1
Updated: 11 months ago

A growing and diverse dataset of text for AI to graze on and learn new information. Just like a pasture in the wild, it is a combination of sources. All the data is in Arrow format so it is easy to randomly access and stream.

43.8 gb
1201
Updated: 11 months ago

LLaVA Visual Instruct 150K is a set of GPT-generated multimodal instruction-following data. It is constructed for visual instruction tuning and for building large multimodal towards GPT-4 vision/language capability.

13.3 gb
1181K
Updated: 11 months ago
Public
3

Wikipedia dataset containing cleaned articles. There are 6.4 million articles that can be streamed via apache arrow files.

20.4 gb
651
Updated: 1 year ago
Public
1

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.