I had grand plans. This was supposed to be a cool looking visualization. Over 2011 I downloaded nearly a terabyte of raw Wikipedia page access stats. And recently I had a python script running for 3 weeks around the clock crunching the data. This was going to be cool. But I ran out of time to complete this project.
So rather than have this data go to waste, I’ll offer it up for whoever wants it. Maybe you can think of something interesting to do with it?
So what do I have? I have hour-by-hour data for the top-3 most-requested pages in the English version of Wikipedia. It is in a CSV file which you can download (wikipedia-hours.zip). I filtered out any requests to Wikipedia “Special”, “User”, “Template” or similar utility pages. What is left is just requests to Wikipedia articles.
And for those of you do not do CSV, I’m including here a table showing the data for January. Some quick observations:
- Some of the hits are obviously driven by breaking news, such as the shooting of Gabrielle Giffords or the death of Jack LaLanne.
- Others are driven by the date, especially holidays or other remembrances
- Some are driven by other media, especially television
- Some are inexplicable, unless you look at the Google Doodle history page. For example, Google highlighted Cezanne’s 172nd birthday on their homepage on January 19th.
- Others defy explanation entirely. For example, why so much interest in User Datagram Protocol (UDP)?
In any case, take and enjoy. Let me know if you do anything interesting with it.
Leave a Reply