Page view statistics for Wikimedia projects
Page view statistics for Wikimedia projects
(For up-to-date information (outages, ...) about this dataset, please consult the
dataset's wiki page
.)
Pagecount files per year
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
What are the page view statistics files and what do they contain?
Each request of a page, whether for editing or reading, whether a "special page" such as a log of
actions generated on the fly, or an article from Wikipedia or one of the other projects, reaches one
of our squid caching hosts and the request is sent via udp to a filter which tosses requests from our internal
hosts, as well as requests for wikis that aren't among our general projects. This filter writes out
the project name, the size of the page requested, and the title of the page requested.
Here are a few sample lines from one file:
fr.b Special:Recherche/Achille_Baraguey_d%5C%27Hilliers 1 624
fr.b Special:Recherche/Acteurs_et_actrices_N 1 739
fr.b Special:Recherche/Agrippa_d/%27Aubign%C3%A9 1 743
fr.b Special:Recherche/All_Mixed_Up 1 730
fr.b Special:Recherche/Andr%C3%A9_Gazut.html 1 737
In the above, the first column "fr.b" is the project name. The following abbreviations are used:
wikibooks: ".b"
wiktionary: ".d"
wikimedia: ".m"
wikipedia mobile: ".mw"
wikinews: ".n"
wikiquote: ".q"
wikisource: ".s"
wikiversity: ".v"
mediawiki: ".w"
Projects without a period and a following character are wikipedia projects.
The second column is the title of the page retrieved, the third column is the number of requests,
and the fourth column is the size of the content returned.
These are hourly statistics, so in the line
en Main_Page 242332 4737756101
we see that the main page of the English language Wikipedia was requested over 240 thousand times
during the specific hour.
These are not unique visits.
In some directories you will see files which have names starting with "projectcount". These are
total views per hour per project, generated by summing up the entries in the pagecount files.
The first entry in a line is the project name, the second is the number of non-unique views, and the
third is the total number of bytes transferred.
Who came up with this stuff anyways? (Alternatively, who can I nag about it?)
Domas Mituzas, a long-time volunteer db admin for WMF, started generating these statistics in
2007
Some of the older files (from 2010 through at least mid-2011) are also available at the
Internet Archive
thanks to
Federico Leva
The dataset is currently (2015) maintained by the
Analytics team
Up to 2015, the dataset has been produced by
Webstatscollector
From 2015 onwards, the dataset is getting produced by stripping down extra-information from
Pagecounts-all-sites
Return to the main index of public data sets provided on this server.
Return to the main index of project dumps in XML format.
Return to the main index of other content