⚓ T193296 Consolidate and improve data usage documentation for WMF-generated data
Page Menu
Phabricator
Create Task
Maniphest
T193296
Consolidate and improve data usage documentation for WMF-generated data
Closed, Resolved
Public
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
TBurmeister
Authored By
nshahquinn-wmf
Apr 27 2018, 10:48 PM
2018-04-27 22:48:36 (UTC+0)
Tags
Documentation
(Organize or Reorganize Technical Documentation)
Wikimania-Hackathon-2018
(Project)
Product-Analytics
(Backlog)
Tech-Docs-Team
(Done)
Goal
Referenced Files
None
Subscribers
Aklapper
BTullis
JFishback_WMF
KCVelaga
Lea_WMDE
nshahquinn-wmf
ppelberg
View All 9 Subscribers
Description
This task covers multiple workstreams related to improving data access documentation for WMF-generated data. I will add subtasks as I define the sub-projects and their priorities.
Terminology
Data usage documentation
refers to technical documents for consumers of Wikimedia data. This content helps users understand how to connect their data tasks and research goals to specific datasets.
Dataset documentation
is technical content and metadata that describes individual datasets. Dataset documentation informs users about attributes of individual datasets and their relationships to other datasets.
Data consumers
: anyone who uses (or could use!) data produced by WMF. Data consumers have differing access to datasets depending on their affiliation.
Data producers
: anyone who publishes data for or about wiki projects. (Documentation work here for the WMF Tech Docs Team primarily involves WMF staff and collaborators, but we may also want to provide guidelines for documenting datasets and analyses that data consumers create using WMF-generated data.)
Key user journeys
Users need data usage documentation and/or dataset documentation at various stages in their journey, depending on their goals, experience level, and other criteria. The primary high-level user journeys I've identified so far for data docs are:
Explore datasets
Find datasets for my task
Decide between datasets
Work with a specific dataset
Publish derived datasets and analyses
Project plans and details
More detailed info in
project doc
(google doc for now, content to move on-wiki when it's more stable)
History of this phab task
The original issue highlighted by this phab task was "make it easier to figure out how to access the various sources of raw data (public and private) about Wikimedia projects and what the policies and procedures around using them are." That is one (big) piece of the puzzle, but the overall picture is larger, so I'm expanding this task to cover that expanded scope.
Links referenced in original phab task:
meta:Research:FAQ
meta:Research:Data
Main entry point for public data, but very out of date.
meta:Research:Data/Dashboards
wikitech:Analytics/Data access
Main entry point for internal/private/production cluster data.
meta:Statistics
office:Data access guidelines
Guides written by particular teams:
meta:Discovery/Analytics
Wikimedia DE analyst document
Readers team docs about data access (focusing on
Hive queries
and
EventLogging queries
Some more inspiration:
mw:Wikimedia Discovery/Team/Analyst onboarding
others?
Proposals and plans referenced in original phab task:
meta:Research:Data
Continues as the main entry point for public data, with a pointer to the private data entry point.
meta:Research:Private data
New main entry point for private data. Content moved here from
wikitech:Analytics/Data access
. Explains what you might use private data for, how you would get access, and why that's hard to do.
Should the main organizing principle be the topic of the data (e.g. editing patterns or article content) or the access method (e.g. the API or the dumps)?
11/15/23: replacing terminology "data access" with "data usage" because "data access" is too synonymous with permissions but the scope of "data usage" extends far beyond that.
Related Objects
Search...
Task Graph
Mentions
Status
Subtype
Assigned
Task
Resolved
TBurmeister
T193296
Consolidate and improve data usage documentation for WMF-generated data
Resolved
nshahquinn-wmf
T217787
Create a dedicated page for information about the Analytics MediaWiki replicas
Resolved
JFishback_WMF
T219542
Make data access guidelines public
Resolved
Aklapper
T293685
Review data landing page
Resolved
TBurmeister
T343146
Create an Introduction to Wikimedia open data
Resolved
TBurmeister
T359568
Publish Intro to Analyzing Wikimedia Content
Declined
TBurmeister
T359570
Publish Intro to Analyzing Wikimedia Traffic
Declined
TBurmeister
T359572
Publish Intro to Analyzing Wikimedia Contributions & Contributors
Open
None
T123989
Consolidate mw:Analytics/Metric definitions into Research namespace on meta
Declined
TBurmeister
T353280
Create introductory docs for research and data tools and techniques
Resolved
TBurmeister
T353283
Update Research portal pages/nav to link to new Research:Data_introduction on Meta
Mentioned In
T349103: Define dataset documentation strategy
T329550: Create user-focused Spark SQL documentation
T348037: Dumps documentation: revise and improve landing pages and navigation
T312997: Assess data access doc collections
T312996: Assess data dumps collection
T312995: Assess Research:Data collection
T238687: Analytics & Data information Documentation
T193269: Onboard Morten Warncke-Wang to the Product Analytics team
Mentioned Here
T343146: Create an Introduction to Wikimedia open data
T293685: Review data landing page
Event Timeline
There are a very large number of changes, so older changes are hidden.
Show Older Changes
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
Apr 27 2018, 10:48 PM
2018-04-27 22:48:36 (UTC+0)
nshahquinn-wmf
mentioned this in
T193269: Onboard Morten Warncke-Wang to the Product Analytics team
Apr 28 2018, 12:00 AM
2018-04-28 00:00:56 (UTC+0)
Aklapper
added a project:
Documentation
Apr 28 2018, 2:41 PM
2018-04-28 14:41:51 (UTC+0)
MBinder_WMF
triaged this task as
High
priority.
May 3 2018, 8:08 PM
2018-05-03 20:08:10 (UTC+0)
MBinder_WMF
lowered the priority of this task from
High
to
Medium
MBinder_WMF
moved this task from
Triage
to
Backlog
on the
Product-Analytics
board.
Tbayer
subscribed.
May 3 2018, 8:16 PM
2018-05-03 20:16:40 (UTC+0)
Vvjjkkii
renamed this task from
Consolidate data analyst documentation
to
z1daaaaaaa
Jul 1 2018, 1:13 AM
2018-07-01 01:13:53 (UTC+0)
Vvjjkkii
raised the priority of this task from
Medium
to
High
Vvjjkkii
added projects:
CheckUser
Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02)
Tamil-Sites
Gamepress
Hashtags
Jade
KartoEditor
Language-2018-Apr-June
New-Editor-Experiences
Mail
TCB-Team (now WMDE-TechWish)
Vvjjkkii
updated the task description.
(Show Details)
Vvjjkkii
removed a subscriber:
Aklapper
Mainframe98
renamed this task from
z1daaaaaaa
to
Consolidate data analyst documentation
Jul 1 2018, 9:10 AM
2018-07-01 09:10:53 (UTC+0)
Mainframe98
lowered the priority of this task from
High
to
Medium
Mainframe98
removed projects:
TCB-Team (now WMDE-TechWish)
Mail
New-Editor-Experiences
Language-2018-Apr-June
KartoEditor
Jade
Hashtags
Gamepress
Tamil-Sites
Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02)
CheckUser
Mainframe98
updated the task description.
(Show Details)
Mainframe98
added a subscriber:
Aklapper
nshahquinn-wmf
renamed this task from
Consolidate data analyst documentation
to
Consolidate data analysis documentation
Jul 18 2018, 8:56 AM
2018-07-18 08:56:12 (UTC+0)
nshahquinn-wmf
added a project:
Wikimania-Hackathon-2018
nshahquinn-wmf
updated the task description.
(Show Details)
nshahquinn-wmf
renamed this task from
Consolidate data analysis documentation
to
Consolidate data access documentation
Jul 18 2018, 9:15 AM
2018-07-18 09:15:16 (UTC+0)
nshahquinn-wmf
updated the task description.
(Show Details)
nshahquinn-wmf
updated the task description.
(Show Details)
Jul 18 2018, 9:24 AM
2018-07-18 09:24:44 (UTC+0)
nshahquinn-wmf
updated the task description.
(Show Details)
Jul 18 2018, 9:27 AM
2018-07-18 09:27:59 (UTC+0)
nshahquinn-wmf
added a subscriber:
Lea_WMDE
Jul 25 2018, 2:03 PM
2018-07-25 14:03:25 (UTC+0)
Comment Actions
I started working on this at the Hackathon and created a
draft of an updated public data portal
, reorganized around the type of data (e.g. editing metadata) rather than the data source (e.g.
EventStreams
).
I also connected with
@Lea_WMDE
and discussed data analysts at Wikimedia Deutschland; there's only one contractor currently, but hopefully there will be more soon, so that would be another opportunity for clear, well-organized documentation to deliver value.
nshahquinn-wmf
claimed this task.
Jul 25 2018, 2:04 PM
2018-07-25 14:04:03 (UTC+0)
nshahquinn-wmf
removed
nshahquinn-wmf
as the assignee of this task.
nshahquinn-wmf
moved this task from
Backlog
to
Doing
on the
Product-Analytics
board.
nshahquinn-wmf
claimed this task.
Jul 26 2018, 6:41 PM
2018-07-26 18:41:14 (UTC+0)
nshahquinn-wmf
moved this task from
Doing
to
Next Up
on the
Product-Analytics
board.
srodlund
moved this task from
Backlog
to
Organize or Reorganize Technical Documentation
on the
Documentation
board.
Aug 14 2018, 11:22 PM
2018-08-14 23:22:27 (UTC+0)
nshahquinn-wmf
moved this task from
Next Up
to
Backlog
on the
Product-Analytics
board.
Aug 24 2018, 7:28 PM
2018-08-24 19:28:48 (UTC+0)
nshahquinn-wmf
updated the task description.
(Show Details)
Nov 19 2018, 10:59 PM
2018-11-19 22:59:42 (UTC+0)
nshahquinn-wmf
updated the task description.
(Show Details)
Mar 6 2019, 9:07 PM
2019-03-06 21:07:16 (UTC+0)
ppelberg
subscribed.
Mar 11 2019, 6:39 AM
2019-03-11 06:39:44 (UTC+0)
JFishback_WMF
subscribed.
May 9 2019, 2:55 PM
2019-05-09 14:55:24 (UTC+0)
Rfarrand
moved this task from
Backlog
to
Project
on the
Wikimania-Hackathon-2018
board.
Aug 13 2019, 4:09 PM
2019-08-13 16:09:08 (UTC+0)
nshahquinn-wmf
closed subtask
T217787: Create a dedicated page for information about the Analytics MediaWiki replicas
as
Resolved
Sep 26 2019, 5:55 PM
2019-09-26 17:55:22 (UTC+0)
mmodell
edited projects, added
Product-Analytics (Kanban)
; removed
Product-Analytics
Oct 16 2019, 5:47 PM
2019-10-16 17:47:43 (UTC+0)
mmodell
edited projects, added
Product-Analytics
; removed
Product-Analytics (Kanban)
Oct 16 2019, 5:51 PM
2019-10-16 17:51:18 (UTC+0)
Nuria
closed subtask
T219542: Make data access guidelines public
as
Resolved
Nov 7 2019, 8:21 PM
2019-11-07 20:21:37 (UTC+0)
Mayakp.wiki
mentioned this in
T238687: Analytics & Data information Documentation
Nov 26 2019, 5:56 PM
2019-11-26 17:56:38 (UTC+0)
nshahquinn-wmf
removed
nshahquinn-wmf
as the assignee of this task.
Jul 20 2020, 4:44 PM
2020-07-20 16:44:03 (UTC+0)
JFishback_WMF
added a project:
Privacy Engineering
Jul 20 2020, 10:37 PM
2020-07-20 22:37:41 (UTC+0)
JFishback_WMF
moved this task from
Incoming
to
Watching
on the
Privacy Engineering
board.
Aklapper
added a project:
Wikimedia-Developer-Portal
Dec 17 2021, 3:47 PM
2021-12-17 15:47:15 (UTC+0)
Aklapper
moved this task from
Inbox
to
Tangents
on the
Wikimedia-Developer-Portal
board.
Aklapper
added a subtask:
T293685: Review data landing page
May 31 2022, 9:17 PM
2022-05-31 21:17:17 (UTC+0)
Aklapper
removed a project:
Wikimedia-Developer-Portal
Aug 24 2022, 7:16 PM
2022-08-24 19:16:14 (UTC+0)
Comment Actions
meta:Research:Data - edited as part of
T293685: Review data landing page
meta:Research:Data/Dashboards - edited in
Aklapper
closed subtask
T293685: Review data landing page
as
Resolved
Aug 24 2022, 7:54 PM
2022-08-24 19:54:23 (UTC+0)
TBurmeister
mentioned this in
T312995: Assess Research:Data collection
Aug 25 2022, 3:18 PM
2022-08-25 15:18:28 (UTC+0)
TBurmeister
mentioned this in
T312996: Assess data dumps collection
TBurmeister
subscribed.
TBurmeister
added a comment.
Oct 20 2022, 3:47 PM
2022-10-20 15:47:52 (UTC+0)
Comment Actions
Just a note that as of 2022 October the Data Engineering team is working on moving content from
subpages to
nshahquinn-wmf
added a project:
Movement-Insights
Jun 1 2023, 6:40 PM
2023-06-01 18:40:22 (UTC+0)
nshahquinn-wmf
removed a project:
Movement-Insights
Jun 27 2023, 6:18 PM
2023-06-27 18:18:11 (UTC+0)
TBurmeister
renamed this task from
Consolidate data access documentation
to
Consolidate and improve data access documentation
Jul 31 2023, 4:44 PM
2023-07-31 16:44:26 (UTC+0)
TBurmeister
changed the task status from
Open
to
In Progress
TBurmeister
claimed this task.
TBurmeister
added a project:
Tech-Docs-Team
Jul 31 2023, 4:59 PM
2023-07-31 16:59:44 (UTC+0)
TBurmeister
moved this task from
Backlog
to
Next
on the
Tech-Docs-Team
board.
apaskulin
moved this task from
Next
to
Active projects
on the
Tech-Docs-Team
board.
Aug 2 2023, 2:22 PM
2023-08-02 14:22:52 (UTC+0)
TBurmeister
mentioned this in
T312997: Assess data access doc collections
Sep 18 2023, 5:07 PM
2023-09-18 17:07:56 (UTC+0)
TBurmeister
added a project:
Goal
Sep 28 2023, 2:21 PM
2023-09-28 14:21:51 (UTC+0)
TBurmeister
mentioned this in
T348037: Dumps documentation: revise and improve landing pages and navigation
Oct 3 2023, 5:58 PM
2023-10-03 17:58:31 (UTC+0)
TBurmeister
mentioned this in
T329550: Create user-focused Spark SQL documentation
Oct 5 2023, 6:20 PM
2023-10-05 18:20:31 (UTC+0)
BTullis
subscribed.
Oct 5 2023, 10:56 PM
2023-10-05 22:56:17 (UTC+0)
TBurmeister
renamed this task from
Consolidate and improve data access documentation
to
Consolidate and improve data access documentation for WMF-generated data
Oct 17 2023, 2:19 PM
2023-10-17 14:19:52 (UTC+0)
TBurmeister
removed a project:
Privacy Engineering
TBurmeister
updated the task description.
(Show Details)
TBurmeister
mentioned this in
T349103: Define dataset documentation strategy
Oct 17 2023, 2:36 PM
2023-10-17 14:36:08 (UTC+0)
TBurmeister
added a comment.
Oct 27 2023, 4:37 PM
2023-10-27 16:37:12 (UTC+0)
Comment Actions
Status update: I'm in the research and information-gathering phase, building my understanding of this space and meeting with subject matter experts to try to narrow down priority focus areas so that I can scope project work for this and coming quarters.
In the past week I had 3 meetings with people from Data Platform Eng, Product Analytics and Data Products; next week I have two more meetings scheduled.
I read various documents written by data consumers, like
this article
and
this PDF guide
written by a Wikimedian in 2012, which, though old, still provides a useful conceptual framework and ideas for how to structure content that introduces data consumers to this topic area.
I read many wiki pages and project docs, in an attempt to get up to speed on the current status of APP work and other ongoing projects.
I learned about webrequests and how the pageviews public dataset is generated, and I started modeling and auditing the documentation for this dataset and its sources.
I learned about the data model behind some of the major tables written by MediaWiki, and started a list of important concepts to make sure data access docs cover for those data sources.
Goals for next week:
Finalize meetings with project owners / subject area experts
Get up to speed on the status of Commons Impact Metrics work and potential areas of documentation impact in that project
Identify focus areas for tech docs project work in Q2-Q4 and start scoping specific project tasks and milestones.
Learn about our other major public datasets and how they are generated
Continue gathering data consumer use cases and examples of analysis tasks to inform future information design work
TBurmeister
added a comment.
Nov 9 2023, 7:54 PM
2023-11-09 19:54:24 (UTC+0)
Comment Actions
Work in this area will proceed in collaboration with the Research and Data Platform Engineering teams as we work on creating new content to help researchers navigate our data landscape, while also coordinating that with changes to the underlying data infrastructure and documentation strategy for that. Details to be worked out in the coming weeks; but at minimum this will include:
Revising and updating content on
Work on
KCVelaga_WMF
subscribed.
Nov 13 2023, 8:46 AM
2023-11-13 08:46:58 (UTC+0)
TBurmeister
renamed this task from
Consolidate and improve data access documentation for WMF-generated data
to
Consolidate and improve data usage documentation for WMF-generated data
Nov 15 2023, 9:40 PM
2023-11-15 21:40:29 (UTC+0)
TBurmeister
updated the task description.
(Show Details)
TBurmeister
added a subscriber:
KCVelaga
Nov 30 2023, 6:33 PM
2023-11-30 18:33:51 (UTC+0)
Comment Actions
I've started a draft on-wiki that attempts to start integrating some of the Research-focused learning goals and data user journeys I identified into an outline. Will continue to build on this as we figure out how to structure the content, i.e. as a set of wiki pages and/or a revised version of the Research:Data portal, or something else still TBD:
@KCVelaga
also has a draft page for dataset-specific content, but we still need to strategize about what info needs to be presented where before we can get a good sense of the "how to present it":
TBurmeister
changed the status of subtask
T343146: Create an Introduction to Wikimedia open data
from
Open
to
In Progress
Nov 30 2023, 9:12 PM
2023-11-30 21:12:44 (UTC+0)
KCVelaga_WMF
unsubscribed.
Dec 5 2023, 3:54 PM
2023-12-05 15:54:53 (UTC+0)
TBurmeister
changed the status of subtask
T353280: Create introductory docs for research and data tools and techniques
from
Open
to
In Progress
Jun 17 2024, 3:04 PM
2024-06-17 15:04:53 (UTC+0)
TBurmeister
closed subtask
T353280: Create introductory docs for research and data tools and techniques
as
Declined
Jun 27 2024, 7:36 PM
2024-06-27 19:36:52 (UTC+0)
TBurmeister
closed subtask
T343146: Create an Introduction to Wikimedia open data
as
Resolved
TBurmeister
closed subtask
T353283: Update Research portal pages/nav to link to new Research:Data_introduction on Meta
as
Resolved
Jun 27 2024, 7:40 PM
2024-06-27 19:40:38 (UTC+0)
TBurmeister
closed this task as
Resolved
Jul 1 2024, 4:45 PM
2024-07-01 16:45:55 (UTC+0)
Comment Actions
Resolving, as planned FY23-24 work on this task is now complete (see updates in subtasks).
TBurmeister
moved this task from
Active projects
to
Done
on the
Tech-Docs-Team
board.
Jul 1 2024, 4:46 PM
2024-07-01 16:46:13 (UTC+0)
apaskulin
awarded a token.
Jul 1 2024, 4:49 PM
2024-07-01 16:49:15 (UTC+0)
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits
US