Language Modeling Datasets
Language modeling is a natural language processing task that predicts the next word in a sequence of words. This is an important task in natural language processing and the base for creating Large Lanugage Models (LLMs).
Displaying Page 2 of 2 (19 total Repositories)
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.