Featured Datasets
Public
0

Training a model on the MedQuAD dataset

113.6 mb
44
Updated: 2 weeks ago

59.1 gb
1651
Updated: 4 days ago

A Dataset for VQA on Document Images.

2.5 gb
17K11
Updated: 5 months ago

based on https://huggingface.co/datasets/Norod78/simpsons-blip-captions

240.3 mb
23.1K4
Updated: 1 year ago

based on https://huggingface.co/datasets/Norod78/simpsons-blip-captions

240.3 mb
43.1K2
Updated: 1 year ago

4.1 gb
63
Updated: 4 weeks ago

50.5 mb
1
Updated: 1 month ago

687 mb
51
Updated: 1 month ago
2

This is a demo repo of notebooks

307.6 mb
92
Updated: 12 hours ago

3 gb
345K11
Updated: 1 month ago
View all featured repositories
Featured Collections

Some of the Oxen team's favorite collections.

LLM-SFT

Interesting datasets to supervise fine-tune (SFT) language models with.

a collection by ox

Visual LLMs

This collection is datasets for understanding of images with large language models

a collection by datasets

LLM-Feedback

Datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO.

a collection by ox

LLM-Eval

A list of standard benchmarks for LLM evaluation

a collection by ox

Multimodal

List of datasets that cross modalities, combinations of text, image, audio, video etc.

a collection by ox

Browse all collections