MediaWiki On Kubernetes - Wikitech
Jump to content
From Wikitech
Wikimedia infrastructure
Data centers
Networking
Global traffic routing
MediaWiki SRE
Application servers
PHP 7 and php-fpm
BounceHandler
Citoid
Dumps
Envoy
EtcdConfig for MediaWiki
External storage
MediaWiki HTTP cache headers
MediaWiki On Kubernetes
mw-cron jobs
mw-experimental
MediaWiki Maintenance scripts
MediaWiki JobQueue
Mathoid
Memcached
mw-mcrouter
Mcrouter runbook
Nutcracker
Parser cache
Redis
Shellbox
Videoscaling
MediaWiki Engineering
MediaWiki at WMF
Parser cache
MediaWiki JobQueue
Performance review
PHP upgrade process
performance.wikimedia.org
Web Perf Hero award
Guides:
Frontend best practices
Backend best practices
more...
Runbooks:
Access control
Daily duties
Multimedia
Data Engineering
SRE Data Persistence
SRE Infra Foundations
SRE Observability
Wikidata Platform
Wikimedia Performance
Event Platform
Release Engineering
Fundraising
edit
MediaWiki-on-Kubernetes
(or
mw-on-k8s
for short) is an initiative to transition the
MediaWiki at WMF
deployment from dedicated
Application servers
to
Kubernetes
. This page contains information about what is changing as part of this transition.
Motivation
"This migration will unleash the ability to deploy multiple versions of code simultaneously. Also, this will help enhance platform capabilities to build dockerized isolated environments for coding, testing and even production debugging.
MediaWiki on Kubernetes will allow us to deprecate and eventually remove a lot of our in-house developed code. Another benefit is that we will be able to react to sudden traffic spikes, like newsworthy events better, as the flexing up and down is a matter of configuration change. This enables efficient placement of workloads, packing workloads in a more environmentally friendly way, increasing hardware utilization."
External traffic
As of today,
100%
of all external traffic is served by MediaWiki on k8s.
mw-api-ext
serves external API traffic
mw-web
serves external browser-like traffic
Exceptions
dumps
Internal traffic
mw-api-int
serves all internal API traffic.
Jobs
Jobqueue
mw-jobrunner
handles jobqueue jobs
Videoscaling
mw-videoscaling
, the mercurius deployment of MediaWiki on k8s in conjuction with
shellbox-video
, handle videoscaling
Maintenance
Two cli deployments of the mediawiki chart are in the process of replacing maintenance hosts.
mw-script
deployment handles
maintenance scripts
via the
mwscript-k8s
wrapper
mw-cron
deployment handles
periodic jobs
Progress can be followed through this
Phabricator task
Server groups
MediaWiki on Kubernetes is deployed to both the main Eqiad and Codfw
datacenters
, in the
wikikube
kubernetes clusters. While the situation is in constant evolution, we currently have the following server groups:
mw-on-k8s deployments
Deployment
Port
Description
mw-debug
4444
For external requests with X-Wikimedia-Debug, like the old mwdebug VMs. Accessible using the
k8s-mwdebug
option of the
WikimediaDebug
browser extension.
mw-debug-next
4453
Like
mw-debug
, but dedicated to testing major changes (e.g., a new version of PHP).
mw-web
4450
For external requests from web browsers (via the CDN).
mw-web-next
4454
Like
mw-web
, but dedicated to supporting progressive migrations of external web traffic (e.g., migrating to a new PHP version).
mw-api-ext
4447
For external requests to the API (via the CDN).
mw-api-ext-next
4455
Like
mw-api-ext
, but dedicated to supporting progressive migrations of external API traffic (e.g., migrating to a new PHP version).
mw-api-int
4446
For internal requests to the API (from other services).
mw-jobrunner
4448
For internal requests from the
JobQueue
runners (except videoscaling jobs).
mw-parsoid
4452
For requests that used to be processed by the parsoid-php cluster
mw-misc
30443 (ingress)
Miscellaneous mediawiki installs
mw-wikifunctions
4451
wikifunctions dedicated deployment
mw-experimental
4456
mw-experimental
deployment
mw-script
N/A
Manual maintenance script deployment
mw-cron
N/A
Periodic jobs deployment
Support deployments
There are two support deployments necessary for MediaWiki on Kubernetes to function correctly. Change deployment to these are not handled by
scap
and need to be done manually with
helmfile
The deployment charts are located at
/srv/deployment-charts/helmfile.d/services/
mw-mcrouter
is deployed in another namespace. This is a
DaemonSet
running
mcrouter
. Deploy changes using
helmfile -e $datacenter -i apply
in the correct directory.
statsd-exporter
is deployed as a separate release in most MediaWiki on Kubernetes namespaces. It ensures that metrics emitted in statsd format by MediaWiki can be scraped by Prometheus. Configuration is common and lives in the
_mediawiki-common_
directory of deployment-charts. Deploy changes using
helmfile -e $datacenter -l name=prometheus -i apply
in each of the MediaWiki on Kubernetes namespaces.
What is in a MediaWiki pod
Each MediaWiki pod in Kubernetes contains 8 containers:
mediawiki-main-tls-proxy
- running
Envoy
, as service mesh and TLS terminator.
mediawiki-main-httpd
- running the Apache httpd daemon.
mediawiki-main-app
- running the PHP daemon.
mediawiki-main-mcrouter
- running
mcrouter
mediawiki-main-rsyslog
- running rsyslog, to collect MediaWiki logs (for
Logstash
) and Apache access logs.
mediawiki-main-{php-fpm,mcrouter,httpd}-exporter
- the
Prometheus
exporters for PHP, mcrouter and Apache httpd.
For more detailed information please see our
MediaWiki On Kubernetes/How it works
page.
How to manage changes to the infrastructure
Given we are in a transition phase between the old puppet-managed systems and MediaWiki running on Kubernetes, we tried to keep things shared as much as possible. This means that
deploying Puppet changes for app servers influences future mw-on-k8s deploys
. However, Puppet merely prepares the host where the Kubernetes images are built, it does not perform a Kubernetes deployment. Applying infra-level changes to Kubernetes services and doing a code deployment are exactly the same procedure, but we need additional care when merging infrastructure changes.
Things that propagate from Puppet to MediaWiki-on-Kubernetes include:
The list of logging brokers, and the udp2log host.
The list of service proxy endpoints to offer, and the list of all available too (out of service::catalog).
The list of MediaWiki sites and Apache configuration parameters (e.g. which domain names for Apache vhosts), but not the Apache config template itself!
The list of memcached servers.
The GeoIP and GeoIPInfo data
So, whenever you want to change any of the above things, you will need to:
check what your change would modify on a
role::deployment_server::kubernetes
host - if it changes a file under
/etc/helmfile-defaults/mediawiki
then the following applies to your change
only merge the change
during a "MediaWiki infrastructure" window
routinely scheduled on
Deployments calendar
, or otherwise well outside of any MediaWiki code deployment window. This is done to allow both SREs and MW developers/deployers to monitor their deployments independently and avoid unexpected consequences.
after the change is merged, ensure a puppet run happens on the MediaWiki
deployment server
, then proceed to re-deploy all MediaWiki service groups on Kubernetes.
How to deploy MediaWiki on Kubernetes
If you are looking to deploy a mediawiki-config change, or a mediawiki code update, you are probably looking for
scap backport
Add a mw-on-k8s deployment
to scap
In order for scap to deploy to your mw-on-k8s deployment, an entry must be added to the
profile::kubernetes::deployment_server::mediawiki::release::mw_releases
configuration in puppet's
hieradata/role/common/deployment_server/kubernetes.yaml
Assuming that you have a single primary release named
main
and a canaries-stage release named
canary
, this would look like:
namespace:
mw-mydeployment
releases:
main:
{}
canary:
stage:
canaries
mw_flavour:
"publish"
web_flavour:
"webserver"
If you have no
canary
release, simply elide it from the
releases
map.
For a detailed explanation of the available configuration fields and the values they accept, see the
DeploymentsConfig
docstring in
to the cookbooks
Add the new mediawiki deployment to the list(s) of mediawiki services in
cookbooks/sre/switchdc/mediawiki/__init__.py
, like
Manual deployment
scap sync-world --stop-before-sync
rebuilds the image and updates the Helmfile release files, so you can call helmfile apply afterwards to manually update the releases
Automatic deployment
scap sync-world --k8s-only
rebuilds the image and only triggers the deployment to mw-on-k8s, leaving the regular Apache targets untouched
Note:
The
--pause-after-testserver-sync
flag will pause deployment after
mw-debug
is updated and wait for confirmation. If you are deploying changes that warrant manual verification against
mw-debug
, consider using this.
scap
will not display helmfile diffs before deployment by default. If you are deploying helmfile changes that need careful sequencing, consider either:
verifying these manually using
helmfile diff
using
scap
's
--k8s-confirm-diffs
option
Full image rebuild and deployment
This can be useful especially when we need to update a base image:
scap sync-world --k8s-only --k8s-confirm-diff -D full_image_build:true
No image build deployment (helmfile only)
The scap way
user@deploy:~$
scap
sync-world
--k8s-only
--k8s-confirm-diff
-Dbuild_mw_container_image:False
For changes that need extra manual verification (e.g., Apache configuration) or sequencing, see Notes in
Automatic deployment
above. Your scap invocation should probably look something like this:
user@deploy:~$
scap
sync-world
--k8s-only
--k8s-confirm-diff
--pause-after-testserver-sync
-Dbuild_mw_container_image:False
The bash way
This should be limited to deploying one or two Mediawiki-On-Kubernetes deployments. Deploying to all should be done with scap.
Deploy
mw-debug
user@deploy:~$
cd
/srv/deployment-charts/helmfile.d/services/mw-debug
user@deploy:/srv/deployment-charts/helmfile.d/services/mw-debug$
helmfile
-e
codfw
-i
apply
--context
&&
helmfile
-e
eqiad
-i
apply
--context
Deploy
couple
Mediawiki-On-Kubernetes
deployments
user@deploy:~$
cd
/srv/deployment-charts/helmfile.d/services
user@deploy:/srv/deployment-charts/helmfile.d/services$
for
in
mw-web
mw-api-int
mw-api-ext
mw-wikifunctions
do
cd
$i
&&
helmfile
-e
codfw
-i
apply
--context
&&
helmfile
-e
eqiad
-i
apply
--context
done
Image rebuild only
If you only need to rebuild the
cli
image
THIS WILL NOT DEPLOY IT TO
mw-cron
or
mw-cli
user@deploy:~$
docker
image
ls
-f
label
org.wikimedia.mediawiki-cli
grep
-v
REPO
awk
'{ print $3 }'
xargs
docker
rmi
# remove cached layers
user@deploy:~$
scap
build-images
"reason"
Troubleshooting
Dashboards
Logs
Apache2 AccessLog dashboard (OpenSearch)
php-fpm slowlog dashboard (OpenSearch)
php-fpm errrolog dashboard (OpenSearch)
Graphs
Bare-metal/k8s rps comparison
mw-on-k8s service graphs
MediaWiki REPL
Any deployer can launch, from a deployment server, a REPL shell (either
shell.php
or
eval.php
) using the
mw-debug-repl
script:
you@deploy1002 $
sudo
mw-debug-repl
Error: a wiki should be provided on the command line
mw-debug-repl - launch a MediaWiki REPL in the kubernetes mw-debug environment.
Usage: mw-debug-repl [-e] [-d ] [-w |]
OPTIONS:
-e Launch eval.php, instead of the default shell.php as REPL
-d|--datacenter Pick a specific datacenter (by default the master will be picked)
-w|--wiki Pick a wiki. For compatibility reasons, the flag can be omitted.
-h|--help Show this help message
EXAMPLES:
Launch an eval.php shell for itwiki in eqiad
sudo
/usr/local/bin/mw-debug-repl
-e
-d
eqiad
--wiki
itwiki
Also
valid:
sudo
/usr/local/bin/mw-debug-repl
-e
--datacenter
eqiad
itwiki
Launch shell.php for enwiki
sudo
/usr/local/bin/mw-debug-repl
enwiki
sudo
/usr/local/bin/mw-debug-repl
--wiki
enwiki
you@deploy1002 $
sudo
mw-debug-repl
enwiki
Finding a mw-debug pod in eqiad...
Now running shell.php for enwiki inside pod/mw-debug.eqiad.pinkunicorn-59b5df7ffd-h2xxq...
Psy Shell v0.11.10 (PHP 7.4.33 — cli) by Justin Hileman
> echo $wmgServerGroup
kube-mw-debug⏎
Get a shell on a production pod
you@deploy1002:~$
kube_env
mw-web-deploy
eqiad
you@deploy1002:~$
kubectl
get
pods
NAME READY STATUS RESTARTS AGE
mw-web.eqiad.canary-68998c7b48-2jt8g 8/8 Running 0 2d13h
mw-web.eqiad.canary-68998c7b48-r5hng 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-69sm2 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-7vtrl 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-9rr9l 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-gjgrk 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-j88nr 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-lpq9f 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-lwhwj 8/8 Running 0 2d13h
mw-web.eqiad.main-7f65fbb9c8-xr64t 8/8 Running 0 2d13h
you@deploy1002:~$
kubectl
exec
mw-web.eqiad.main-7f65fbb9c8-69sm2
-c
mediawiki-main-app
-it
--
/bin/bash
www-data@mw-web:/$
strace a production PHP process
The following instructions require production root access to complete. Only Wikimedia SREs or equivalent users can follow this process.
root@deploy1002 #
kubectl
-n
mw-web
describe
pod
mw-web.eqiad.main-7f65fbb9c8-69sm2
grep
Node:
Node: kubernetes1023.eqiad.wmnet/10.64.32.21
root@deploy1002 #
kubectl
-n
mw-web
exec
pod/mw-web.eqiad.main-7f65fbb9c8-69sm2
-c
mediawiki-main-app
--
ps
-eo
pid,pidns,args
PID PIDNS COMMAND
1 4026533595 php-fpm: master process (/etc/php/7.4/fpm/php-fpm.conf)
1345418 4026533595 php-fpm: pool www
1346141 4026533595 php-fpm: pool www
1346193 4026533595 php-fpm: pool www
1346539 4026533595 php-fpm: pool www
1346750 4026533595 php-fpm: pool www
1346818 4026533595 php-fpm: pool www
1348934 4026533595 php-fpm: pool www
1349988 4026533595 php-fpm: pool www
2367349 4026533595 ps -eo pid,pidns,args
Let us strace PID 1346750 in the container. The container does not have strace, but we can run it on the host. But it is in a PID namespace so we have to figure out what the PID is in the root namespace. In the host, /proc/$pid/status contains the PID in both the container's namespace and the root namespace. We can confirm the identification by checking that the pidns is the same.
ssh
kubernetes1023.eqiad.wmnet
you@kubernetes1023 $
grep
NStgid.*1346750
/proc/*/status
/proc/69585/status:NStgid: 69585 1346750
you@kubernetes1023 $
sudo
ps
-p
69585
-o
pidns
PIDNS
4026533595
you@kubernetes1023 $
sudo
strace
-p
69585
strace: Process 69585 attached
Stream logs directly from kafka
If you are doing troubleshooting using a
mw-debug
pod through
X-Wikimedia-Debug
or
MediaWiki REPL
and do not want to wait for the logs to show up on Logstash, you can log into a
stat
host and use
Kafkacat
to stream the logs directly from the
kafka
firehose
cgoubert@stat1008:/srv/home/cgoubert$
kafkacat
-C
-b
kafka-logging1001.eqiad.wmnet:9092
-t
k8s-mw-eqiad
grep
mw-debug
jq
Retrieved from "
Category
SRE Service Operations
MediaWiki On Kubernetes
Add topic