Wednesday, May 12, 2021

We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

 https://commoncrawl.org/

Interesting open data set of web crawl data sitting out on S3