text language-modeling datasets

Language Modeling Datasets

Language modeling is a natural language processing task that predicts the next word in a sequence of words. This is an important task in natural language processing and the base for creating Large Lanugage Models (LLMs).

Displaying Page 2 of 2 (20 total Repositories)

ox/reddit_tifu

Natural Language Processing Language Modeling

670.6 mb

Updated: 2 years ago

datasets/ptb_text_only

Natural Language Processing Language Modeling

3.5 mb

Updated: 10 months ago

datasets/silicone

Natural Language Processing Text Classification Language Modeling

24.8 mb

230

Updated: 10 months ago

datasets/kilt_tasks

Natural Language Processing Text Classification Language Modeling Question Answering

1.1 gb

231

Updated: 10 months ago

datasets/WikiText

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

Natural Language Processing Language Modeling

316.2 mb

Updated: 1 year ago