User:GreenC/WaybackMedic 2.5 - Wikimedia Commons
Jump to content
From Wikimedia Commons, the free media repository
User:GreenC
WaybackMedic
by GreenC
Wayback Medic 2.5
is a bot that adds and maintains links from the
list of known webarchive services in use on Wikimedia sites
Edits made after 2018-12-04 are by version 2.5
The bot operator is
User:GreenC
. The bot account is
User:GreenC bot
. The bot (software) is "WaybackMedic".
Note: Some of the below functions and links are relevant to Enwiki only.
WaybackMedic Fixes
Fix number
Function name
Example edit
Description
Notes
Date added
fixthespuriousone
Example
Remove spurious
|1=
in cite templates.
August 2016
fixmissingprotocol
Example
1. Add
https
if protocol missing from the archive.org URL.
2. Convert existing protocol http to https.
3. Add second-level domain
web
if missing (archive.org/web/ → web.archive.org/web/)
4. Add
/web/
path (web.archive.org/2016/ → web.archive.org/web/2016/). In some URLs adding
/web/
breaks the link, test for those.
HTTPS
per RFC
August 2016
fixemptyarchive
Example
1. If
|archiveurl=
is empty or missing but
|archivedate=
has content, attempt to find a working archive URL based on the archive date, otherwise add
{{
dead link
}}
if appropriate.
2. If
|archivedate=
is empty or missing but
|archiveurl=
has content, generate date value based on timestamp in the archive URL.
3. If
|archiveurl=
and
|archivedate=
are empty, remove both and leave a
{{
dead link
}}
if appropriate.
August 2016
fixbadstatus
Example
Check all Wayback Machine URLs for response code errors (anything but 200s). If an error code, try for a better URL via the Wayback API - first using accessdate, then using the earliest date available. If none there, check WebCite API. Try Memento API which checks a few dozen other archives. Other techniques undocumented. If still none found, remove
|archiveurl=
and
|archivedate=
and add {{
dead link
}}.
August 2016
Retired
fixemptywayback
Example
The wayback template is mangled in a certain way. Action: re-assemble. It won't delete multiple instances if they exist in the same ref (as in the Example).
August 2016
fixencodedurl
Example
The URL was incorrectly encoded. Fully decode URL and re-encode.
August 2016
fixdatemismatch
Example
1. Ensure
|archivedate=
matches the snapshot date in the URL
2. Ensure date format matches dmy or mdy if set (retain ymd if in use)
August 2016
fixwebcitlong
Example
Example
Convert WebCite URL's from short-form to long-form
Convert Freezepage.com URL's from short-form to long-form
WebCite Usage
January 2017
10
fixstraydt
Example
Remove stray
{{
dead link
}}
template when an archive exists for the link
January 2017
11
fixwam
Example
Merge {{
wayback
}} and {{
webcite
}} --> {{
webarchive
}}
Merge completed February 5, 2017
Webarchive TfM
January 2017
12
fixiats
Example
archive url -> |archive-url)
January 2017
13
fixswitchurl
Example
Move an archive.org URL from
|url=
to
|archiveurl=
and add
|archivedate=
if missing.
January 2017
14
Retired
15
fixembway
Example
Example
1. A
{{
wayback
}}
is embedded in a CS template.
2. A
{{
dead link
}}
is embedded in a CS template.
January 2017
16

Example
Timestamp and/or
|archivedate=
is 19700101 and/or out-of-bounds.
January 2017
17
fixdoubleurl
Example
archive.org URLs are doubled, tripled, etc..
January 2017
18
fixemptywebarchive
Example
{{
webarchive
}}
|date=
is missing or empty value.
January 2017
19
fixdoublewebarchive
Example
Remove duplicate
{{
webarchive
}}
instances.
January 2017
20
fixembwebarchive
Example
{{
cite web
}}
is embedded in a
{{
webarchive
}}
January 2017
21
fixarchiveis
Example
Example
1. Convert
Archive.is
URL's from short-form to long-form
2. Fix URL encoding of broken links
Archive.is Usage
January 2017
22
fixitems
Example
Change "/items/" URLs that are using machine IDs
BRFA
January 2017
23
encodemag
Example
Convert MediaWiki encoding to
url encoding
in URLs (ie. {{!}} and {{=}})
RFC3986
January 2017
24
decodespace
Example
Convert %20 to +, + to %20, etc.. in URLs that can be repaired this way
See also
June 2017
25
waytree_trailgarb
Example
Example
Example
Remove typical garbage characters found at the end of URLs: .,;:-"l(%XX)('')
February 2018
26
fixcommentarchive
Example
Open-up commented-out archives and add a
|deadurl=
"yes" or "no"
February 2018
27
waytree_x2encoding
Example
Repair double URL-encoding eg. %3A -> %253A
February 2018
28
fixencodebug
Example
Repair missed URL-encoding of square brackets
Tracked in
Phabricator
Task T186417
February 2018
29
fixiats
Example
Example
Restore truncated Wayback URL
February 2018
30
fixiats
Example
Convert
|title={title
} ->
|title=Archived copy
Tracked in
Phabricator
Task T203865
September 2018
31
urlchanger
Example
Move broken URL to a new working URL and undo previous archives.
BOTREQ
November 2018
32
cosmetic
Example
Example
Example
Example
Example
Edits that might be cosmetic. Only with other edits.
1. Del trailing # in URLs
2. Del empty archive fields
3. archive.is --> archive.today
4. Fix double fragments
5. Convert protocol-relative URLs
w:WP:PRURL
T214855
Archive.today
January 2019
BotWikiAwk
Technical details
Changes to URLs are checked against the remote site to ensure they are working
Real-time link checks, no link database. However, links are checked over a 24hr period before final upload of diff.
Supports many APIs including Internet Archive, Memento, WebCite and "Timemap" APIs at individual services
Multiple HTTP header status code checks at the application (WaybackMedic) layer
Additional time-out & retries built-in to the web transfer libraries.
Additional operating-procedure level checks against network and other errors - bot is semi-supervised in known trouble areas.
Multiple redundant checks of the APIs using multiple dates to ensure a page really is unavailable
Accepts API results but then verifies by looking at page headers and/or contents
The bot is primarily written in
Nim
(compiles to C source) with support utilities in
Awk
. Libraries were custom made including a string primitives library for regex, a wiki template parsing library, OAuth library (in awk), a MediaWiki API interface library, a soft404 detector.
Due to the nature of the task, running the bot includes a fair amount of supervisory overhead so it requires operator training, though the steps are documented in the source package.
Notes
edit
Retrieved from "
User
GreenC/WaybackMedic 2.5
Add topic