Wikipedia dataset containing cleaned articles. There are 6.4 million articles that can be streamed via apache arrow files.