Explore Repository Collections

Collections

Interesting datasets to supervise fine-tune (SFT) language models with.

a collection by ox

This collection is datasets for understanding of images with large language models

a collection by datasets

Datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO.

a collection by ox

A list of standard benchmarks for LLM evaluation

a collection by ox

List of datasets that cross modalities, combinations of text, image, audio, video etc.

a collection by ox

This is a set of datasets that are useful for coding LLMs, mainly extracted from the OpenCoder paper: https://arxiv.org/abs/2411.04905

a collection by ox

This collection has no description

a collection by Salihi

This collection has no description

a collection by oxbot