⚓ T387130 CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API
Page Menu
Phabricator
Create Task
Maniphest
T387130
CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API
Closed, Resolved
Public
Security
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
cscott
Authored By
zoe
Feb 24 2025, 3:30 PM
2025-02-24 15:30:38 (UTC+0)
Tags
Security-Team
(In Progress)
Security
MediaWiki-Action-API
(Unsorted)
Vuln-XSS
Vuln-Inject
(Tracked)
SecTeam-Processed
(Completed)
Content-Transform-Team (Work In Progress)
(Backlog)
Essential-Work
MW-Interfaces-Team
(Incoming (Needs Triage))
Patch-For-Review
Referenced Files
F59025694: 02-T387130-2.patch
Apr 9 2025, 2:30 PM
2025-04-09 14:30:00 (UTC+0)
F59025660: 02-T387130.patch
Apr 9 2025, 2:30 PM
2025-04-09 14:30:00 (UTC+0)
F58631089: REL1_39-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Mar 7 2025, 3:39 AM
2025-03-07 03:39:03 (UTC+0)
F58626453: REL1_42-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Mar 6 2025, 10:02 PM
2025-03-06 22:02:29 (UTC+0)
F58625014: REL1_43-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Mar 6 2025, 8:18 PM
2025-03-06 20:18:40 (UTC+0)
F58513634: 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Feb 27 2025, 6:32 PM
2025-02-27 18:32:36 (UTC+0)
F58513114: 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Feb 27 2025, 5:09 PM
2025-02-27 17:09:53 (UTC+0)
F58511482: 0001-Avoid-attacks-in-API-related-to-unicode-NFC-and-deco.patch
Feb 27 2025, 11:07 AM
2025-02-27 11:07:04 (UTC+0)
View All 17 Files
Subscribers
ABreault-WMF
acooper
Aklapper
Bawolff
cscott
dchan
DLynch
View All 20 Subscribers
Description
TL;DR
The MediaWiki Action API converts output to Unicode Normalization Form C. Unfortunately, for HTML strings this is unsafe, because the sequence ‘
’ + U+0338 gets replaced by U+226F, breaking the tag end and potentially allowing injection attacks.
Steps to reproduce
Visit any wiki page, such that
mw.Api()
has loaded
Open the console
Perform any mw.Api call that generates HTML, such that you can make the first character inside a tag be
U+0338 COMBINING LONG SOLIDUS OVERLAY
In other words, you want the API to send you some HTML that contains ‘
’ followed by U+0338.
For example, call
{ action: 'visualeditor', paction: 'parsefragment' }
as follows:
const
COMBINING_LONG_SOLIDUS
'\u0338'
new
mw
Api
().
post
action
'visualeditor'
paction
'parsefragment'
page
'Test'
wikitext
COMBINING_LONG_SOLIDUS
' onmouseover="alert(42)" >content'
).
done
data
=>
const
content
data
visualeditor
content
document
body
innerHTML
content
console
log
'Content:'
content
);
).
fail
err
=>
console
error
err
);
This can also be reproduced without JavaScript – note the missing
after
id="mwAg"
curl -s -d
action
visualeditor -d
paction
parsefragment -d
page
Test -d
wikitext
$'\u0338 onmouseover="alert(42)" >content '
-d
format
json https://en.wikipedia.org/w/api.php
jq -r .visualeditor.content
hexdump -C
00000000 3c 70 20 69 64 3d 22 6d 77 41 67 22 e2 89 af 20 |

00000010 6f 6e 6d 6f 75 73 65 6f 76 65 72 3d 22 61 6c 65 |onmouseover="ale|
00000020 72 74 28 34 32 29 22 20 3e 63 6f 6e 74 65 6e 74 |rt(42)" >content|
00000030 20 3c 2f 70 3e 0a |

.|
00000036
Compare this with a normal slash in the input:
curl -s -d
action
visualeditor -d
paction
parsefragment -d
page
Test -d
wikitext
'/ onmouseover="alert(42)" >content '
-d
format
json https://en.wikipedia.org/w/api.php
jq -r .visualeditor.content
hexdump -C
00000000 3c 70 20 69 64 3d 22 6d 77 41 67 22 3e 2f 20 6f |

/ o|
00000010 6e 6d 6f 75 73 65 6f 76 65 72 3d 22 61 6c 65 72 |nmouseover="aler|
00000020 74 28 34 32 29 22 20 3e 63 6f 6e 74 65 6e 74 20 |t(42)" >content |
00000030 3c 2f 70 3e 0a |

.|
00000035
Actual behaviour
The sequence ‘
’ + U+0338 gets replaced with the combined character U+226F
≯ NOT GREATER THAN
. This is due to applying Unicode Normalization Form C. But (surprisingly!) that breaks the HTML tag, which potentially allows a Javascript injection attack, for instance:
'

content

'
Expected behaviour
The HTML arrives with the sequence ‘
’ + U+0338 intact.
'

\u0338 onmouseover="alert(42)" >content

'
Note that this is a regular '
' symbol closing the HTML tag, followed by a U+0338
◌̸ COMBINING LONG SOLIDUS OVERLAY
, such that the contents of the


tag are
"\u0338 onclick="alert(42)" >content"
Debugging note
Both expected and actual output may look similar or identical when rendered in the console:

content

# actual

̸ onmouseover="alert(42)" >content

# expected
The best way to see for sure what’s there is to escape non-ASCII characters with a function:
function
showUnicode
text
return
text
replace
/[^\x00-\x7F]/g
ch
=>
'\\u'
ch
charCodeAt
).
toString
16
).
padStart
'0'
);
text
'

content

'
console
log
showUnicode
text
);
//

content


Cause
The cause was identified by
@dchan
in the
related ticket
. The function
MediaWiki::Api::ApiResult::validateValue
is not just catching invalid UTF-8. It is also applying Unicode Normalization Form C. Unfortunately, as we have seen, it is unsafe to do this on HTML strings if they might contain ‘
’ + U+0338.
MediaWiki::Api::ApiResult::addValue
MediaWiki::Api::ApiResult::validateValue
MediaWiki::Language::normalize
UtfNormal::Validator::cleanUp
normalizer_normalize( $string, Normalizer::FORM_C )
Details
Risk Rating
High
Author Affiliation
WMF Product
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Replace isolated combining characters
utfnormal
master
+134
-14
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
master
+116
-9
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
REL1_42
+115
-10
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
REL1_43
+116
-9
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
REL1_39
+47
-10
Update wikimedia/parsoid to 0.16.5
mediawiki/vendor
REL1_39
+81
-64
Update wikimedia/parsoid to 0.19.2
mediawiki/vendor
REL1_42
+37
-47
Update wikimedia/parsoid to 0.20.2
mediawiki/vendor
REL1_43
+32
-30
Customize query in gerrit
Related Objects
Search...
Task Graph
Mentions
Status
Subtype
Assigned
Task
Resolved
None
T382316
Release MediaWiki 1.39.12/1.42.6/1.43.1
Restricted Task
Resolved
Security
cscott
T387130
CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API
Mentioned In
T382756: Error When Editing Pages with Specific Unicode Character in Visual Editor
Mentioned Here
T382316: Release MediaWiki 1.39.12/1.42.6/1.43.1
T354361: HtmlHelper::modifyElements(…, $html5format = false) incorrectly encodes HTML entities
T363764: Refactor dependency injection (DI) in OutputTransform stages
T343994: OutputPage::setPageTitle() should not accept Message objects, introduce OutputPage::setPageTitleMsg()
T266140: HTML entity replaced by the Unicode character in an edit
T346197: Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
T17261: Trimmed multibyte characters result in invalid XML
T18262: File metadata containing invalid characters produce bad-formed XML
T324431: Parsoid: displaytitle HTML now appearing in element rather than page title<br>Event Timeline<br>There are a very large number of changes, so older changes are hidden.<br>Show Older Changes<br>ihurbain<br>added a comment.<br>Feb 27 2025, 9:29 AM<br>2025-02-27 09:29:47 (UTC+0)<br>Comment Actions<br>@cscott<br>I think you haven't attached the OutputTransform patches correctly, here I can only see their file names<br>dchan<br>added a comment.<br>Feb 27 2025, 10:26 AM<br>2025-02-27 10:26:24 (UTC+0)<br>Comment Actions<br>In<br>T387130#10585557<br>@cscott<br>wrote:<br>Here is a patch for 1b from above (<br>T387130#10584304<br>). This patches wikimedia/utfnormal to prefix isolated combining characters, using the unicode database [...]<br>0004-Replace-isolated-combining-characters.patch<br>134 KB<br>$canPrecedeCombining<br>[];<br>[...]<br>public<br>function<br>testU0338<br>()<br>$text<br>\u<br>{0338}<<br>\u<br>{0338}><br>\u<br>{0338}"<br>$expect<br>\u<br>{25CC}<br>\u<br>{0338}<br>\u<br>{226E}<br>\u<br>{226F}"<br>$this<br>-><br>assertEquals<br>bin2hex<br>$expect<br>),<br>bin2hex<br>Validator<br>::<br>cleanUp<br>$text<br>);<br>Please can we remove '<br>' and '<br>' from<br>$canPrecedeCombining<br>? that would render html invulnerable, without breaking any combining sequence except<br>">\x{0338}"<br>and<br>"<\x{0338}"<br>— both of which have precomposed alternatives and are bad things to be floating around our ecosystem.<br>Also that way I don't have to fix anything in ApiVisualEditor.php 😁<br>Then the correct test would be:<br>$text<br>\u<br>{0338}<<br>\u<br>{0338}><br>\u<br>{0338}"<br>$expect<br>\u<br>{25CC}<br>\u<br>{0338}<<br>\u<br>{25CC}<br>\u<br>{0338}><br>\u<br>{25CC}<br>\u<br>{0338}"<br>Bawolff<br>added a comment.<br>Edited<br>Feb 27 2025, 10:28 AM<br>2025-02-27 10:28:35 (UTC+0)<br>Comment Actions<br>Yes, I agree that would be better. What I don't know is why we choose to normalize html (or wikitext) to NFC there. Is that a step we actually need?<br>I looked it up. It was to fix<br>T17261<br>and<br>T18262<br>r45749<br>That said, it does seem like it only really needs valid unicode not normalized unicode.<br>ihurbain<br>added a comment.<br>Edited<br>Feb 27 2025, 10:58 AM<br>2025-02-27 10:58:05 (UTC+0)<br>Comment Actions<br>Nag on the back of my mind: can we have a potential vector around source ranges?<br>We have things like<br>T346197<br>(and related "bad UTF-8" issues) that hint that we occasionally try to access character ranges that were not the ones we actually wanted to; there might be a way to craft things around these?<br>I think we might need a patch on PHPUtils::safeSubstr in Parsoid to avoid returning a string starting with \u0338? (which we do right now:<br>$ php run.php shell<br>> use Wikimedia\Parsoid\Utils\PHPUtils;<br>> PHPUtils::safeSubstr("\u{0338}aaa", 0, 5);<br>= "̸aaa"<br>dchan<br>added a comment.<br>Feb 27 2025, 10:59 AM<br>2025-02-27 10:59:18 (UTC+0)<br>Comment Actions<br>In<br>T387130#10585603<br>@Bawolff<br>wrote:<br>The requirements are:<br>combining slash is sometimes valid, so we cant outright ban it<br>< followed by combining slash will almost always be malicious in output because most keyboards output the precomposed form and we also run NFC on all user input<br>we cant generically replace combining slash with entity at the normalization stage as we dont know if the output is html or not.<br>Yes, very clearly put. However if we have unicode data available, we could insert U+25CC '<br>' inside any sequence where '>' is followed by a combining character. I think that's reasonable for a function called<br>cleanUp<br>What if on any output normalization (so excluding normalization of user input in WebRequest), we first count the number of precomposed "not greater than" signs, normalize, and then count again. If the number changes we know an attack is happening since we assume at this point the decomposed not greater than is always malicious.<br>That's an ingenious idea. Maybe we should do it if we end up unconvinced that our coverage is good enough?<br>Bawolff<br>added a comment.<br>Feb 27 2025, 11:07 AM<br>2025-02-27 11:07:04 (UTC+0)<br>Comment Actions<br>In<br>T387130#10585603<br>@Bawolff<br>wrote:<br>Thinking about this<br>The requirements are:<br>combining slash is sometimes valid, so we cant outright ban it<br>< followed by combining slash will almost always be malicious in output because most keyboards output the precomposed form and we also run NFC on all user input<br>we cant generically replace combining slash with entity at the normalization stage as we dont know if the output is html or not.<br>What if on any output normalization (so excluding normalization of user input in WebRequest), we first count the number of precomposed "not greater than" signs, normalize, and then count again. If the number changes we know an attack is happening since we assume at this point the decomposed not greater than is always malicious. At this point we throw an exception or maybe go back to the unnormalized string, replace all combining slash with unicode replacement and try again (its ok to be a little lossy here since we assume this code path only happens during an attack). Thoughts?<br>If we do strip these characters, i think it is less confusing to the user to replace them with unicode replacement character then to just silently delete.<br>Attempt at implementing this idea, although maybe it belongs as a method of UTFValidator instead of in API. I only covered<br>as I don't think ≮ is a security risk, but maybe it would make more sense to cover both just in case.<br>0001-Avoid-attacks-in-API-related-to-unicode-NFC-and-deco.patch<br>2 KB<br>zoe<br>added a comment.<br>Feb 27 2025, 1:10 PM<br>2025-02-27 13:10:15 (UTC+0)<br>Comment Actions<br>What if on any output normalization (so excluding normalization of user input in WebRequest), we first count the number of precomposed "not greater than" signs, normalize, and then count again. If the number changes we know an attack is happening since we assume at this point the decomposed not greater than is always malicious.<br>Headline: I think this would work.<br>We are relying on the properties of the characters, not the NFC algorithm itself. If we can find something that composes with<br>U+003E ><br>and which comes earlier in the canonical ordering algorithm than<br>U+0338 ◌̸<br>then we could create a string which normalises with fewer copies of<br>and bypass such a check.<br>I'm having trouble finding details of the normalization algorithm, but experimentally we can gain confidence:<br>[...<br>new<br>Array<br>0xffff<br>)].<br>map<br>((<br>=><br>'\u226f'<br>String<br>fromCodePoint<br>)).<br>normalize<br>"NFC"<br>)).<br>filter<br>((<br>=><br>codePointAt<br>!==<br>0x226f<br>//[]<br>dchan<br>added a comment.<br>Feb 27 2025, 2:05 PM<br>2025-02-27 14:05:35 (UTC+0)<br>Comment Actions<br>In<br>T387130#10586700<br>@zoe<br>wrote:<br>We are relying on the properties of the characters, not the NFC algorithm itself. If we can find something that composes with<br>U+003E ><br>and which comes earlier in the canonical ordering algorithm than<br>U+0338 ◌̸<br>then we could create a string which normalises with fewer copies of<br>and bypass such a check.<br>I'm having trouble finding details of the normalization algorithm, but experimentally we can gain confidence:<br>[...<br>new<br>Array<br>0xffff<br>)].<br>map<br>((<br>=><br>'\u226f'<br>String<br>fromCodePoint<br>)).<br>normalize<br>"NFC"<br>)).<br>filter<br>((<br>=><br>codePointAt<br>!==<br>0x226f<br>//[]<br>Oh yes, that's an important thing to consider. You're right that we're safe: the 4th field of UnicodeData.txt shows U+0338 has Canonical Combining Class 1, which is the lowest possible for a combining character. Therefore no combining character can be moved before U+0338 by the Canonical Ordering Algorithm.<br>0338;COMBINING LONG SOLIDUS OVERLAY;Mn;1;NSM;;;;;N;NON-SPACING LONG SLASH OVERLAY;;;;<br>cscott<br>added a comment.<br>Feb 27 2025, 3:35 PM<br>2025-02-27 15:35:36 (UTC+0)<br>Comment Actions<br>In<br>T387130#10586404<br>@dchan<br>wrote:<br>In<br>T387130#10585557<br>@cscott<br>wrote:<br>Here is a patch for 1b from above (<br>T387130#10584304<br>). This patches wikimedia/utfnormal to prefix isolated combining characters, using the unicode database [...]<br>0004-Replace-isolated-combining-characters.patch<br>134 KB<br>$canPrecedeCombining<br>[];<br>[...]<br>public<br>function<br>testU0338<br>()<br>$text<br>\u<br>{0338}<<br>\u<br>{0338}><br>\u<br>{0338}"<br>$expect<br>\u<br>{25CC}<br>\u<br>{0338}<br>\u<br>{226E}<br>\u<br>{226F}"<br>$this<br>-><br>assertEquals<br>bin2hex<br>$expect<br>),<br>bin2hex<br>Validator<br>::<br>cleanUp<br>$text<br>);<br>Please can we remove '<br>' and '<br>' from<br>$canPrecedeCombining<br>? that would render html invulnerable, without breaking any combining sequence except<br>">\x{0338}"<br>and<br>"<\x{0338}"<br>— both of which have precomposed alternatives and are bad things to be floating around our ecosystem.<br>This code applies *after* NFC normalization has been done. So<br>and<br>will never appear as a preceding character. That's not the point of this code -- the point of this code is to eliminate "hanging" combining characters that are then time bimbs that cause trouble when they are pasted inside an HTML tag.<br>cscott<br>added a comment.<br>Edited<br>Feb 27 2025, 5:09 PM<br>2025-02-27 17:09:53 (UTC+0)<br>Comment Actions<br>I am (currently) proposing the following two patches:<br>To mediawiki-core: (updated to include Tidy/Html/Message/OutputTransform)<br>0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>15 KB<br>To Parsoid: (unchanged from<br>T387130#10584604<br>0001-Entity-escape-U-0338-where-needed-to-make-HTML-outpu.patch<br>8 KB<br>The Parsoid patch will have to be deployed to prod as a patch to the vendor directory.<br>All other patches are additional hardenings which are nice-to-have but shouldn't be on the critical path here. In particular, I think hardening<br>UtfNormal\Validator::cleanUp()<br>is worthwhile, but isn't necessary to stop the injection attack. I have a number of other patches in my tree which shift code to using the<br>Html::*<br>helper classes instead of string concatenation which is again helpful, but unnecessary given a final postprocessing pass to entity-escape all remaining U+0338, which is what the above implements.<br>I also considered a patch to OutputPage::output() to do a final fail-safe against U+0338, but that also doesn't seem strictly necessary as all the attack vectors go through the Action API (which is where NFC normalization is performance) not direct HTML output (which is what OutputPage::output() does).<br>sbassett<br>added a subscriber:<br>gerritbot<br>Feb 27 2025, 5:18 PM<br>2025-02-27 17:18:50 (UTC+0)<br>Comment Actions<br>In<br>T387130#10588179<br>@cscott<br>wrote:<br>The Parsoid patch will have to be deployed to prod as a patch to the vendor directory.<br>All other patches are additional hardenings which are nice-to-have but shouldn't be on the critical path here.<br>Ok, once we have all of the relevant patches completed and code-reviewed, we should categorize them as "this patch needs a discrete security deployment to Wikimedia production to fix the active security issues there" and "this patch is code-hardening". The latter should all be able to go through gerrit. And of course Parsoid (and similar) patches can go through gerrit, where we don't have a defined, discrete Wikimedia production security deployment process.<br>ssastry<br>added a comment.<br>Feb 27 2025, 5:45 PM<br>2025-02-27 17:45:30 (UTC+0)<br>Comment Actions<br>In<br>T387130#10588179<br>@cscott<br>wrote:<br>I am (currently) proposing the following two patches:<br>To mediawiki-core: (updated to include Tidy/Html/Message/OutputTransform)<br>0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>15 KB<br>My IDE shows that there is a typo in this patch in Message .. "escapeCombiningChars" instead of "escapeCombiningChar". Is it easy to update Message tests to cover format* functions you updated in this patch?<br>dchan<br>added a comment.<br>Feb 27 2025, 6:00 PM<br>2025-02-27 18:00:41 (UTC+0)<br>Comment Actions<br>In<br>T387130#10587431<br>@cscott<br>wrote:<br>In<br>T387130#10586404<br>@dchan<br>wrote:<br>Please can we remove '<br>' and '<br>' from<br>$canPrecedeCombining<br>? that would render html invulnerable, without breaking any combining sequence except<br>">\x{0338}"<br>and<br>"<\x{0338}"<br>— both of which have precomposed alternatives and are bad things to be floating around our ecosystem.<br>This code applies *after* NFC normalization has been done. So<br>and<br>will never appear as a preceding character. That's not the point of this code -- the point of this code is to eliminate "hanging" combining characters that are then time bimbs that cause trouble when they are pasted inside an HTML tag.<br>Oh sorry, I missed that. Then shouldn't we fix U+0338 before NFC normalization?<br>$string<br>self<br>::<br>prependIsolatedCombining<br>$string<br>);<br>$norm<br>normalizer_normalize<br>$string<br>Normalizer<br>::<br>FORM_C<br>);<br>...<br>return<br>self<br>::<br>prependIsolatedCombining<br>self<br>::<br>NFC<br>$string<br>);<br>return<br>self<br>::<br>NFC<br>self<br>::<br>prependIsolatedCombining<br>$string<br>);<br>I think we need it, because otherwise<br>MediaWiki::Request::WebRequest::getValues<br>breaks valid HTML / wikitext. For example, right now (even with the above patches), If I use VisualEditor to save a source page containing<br>"FOO<!-- x -->\x{0338}BAR"<br>then<br>MediaWiki::Request::WebRequest::getValues<br>mangles it into<br>"FOO<!-- x --\x{226F}BAR"<br>Of course our VisualEditor patch could fix it on our side, but if this affects VisualEditor then it probably affects many other uses too. And the content we're sending isn't obviously invalid. I think if<br>MediaWiki::Request::WebRequest<br>is going to apply NFC normalization, then it's essential that it fix '<br>' + U+0338 first.<br>cscott<br>added a comment.<br>Feb 27 2025, 6:22 PM<br>2025-02-27 18:22:41 (UTC+0)<br>Comment Actions<br>@dchan<br>that's the input side, and<br>T266140<br>. You need to use the<br>raw<br>API flag from<br>I2e78e660ba1867744e34eda7d00ea527ec016b71<br>on input. That's not a security bug, though, because the broken HTML is rejected on input. Let's keep this task for output-side issues.<br>cscott<br>added a comment.<br>Feb 27 2025, 6:32 PM<br>2025-02-27 18:32:36 (UTC+0)<br>Comment Actions<br>@ssastry<br>Thanks for the catch! Updated core patch, plus Message tests:<br>0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>17 KB<br>ssastry<br>added a comment.<br>Feb 27 2025, 7:49 PM<br>2025-02-27 19:49:47 (UTC+0)<br>Comment Actions<br>One more question about Message.php changes: you added the escaping on line 1060 in the format(..) function. why is it not required on other code paths? And could replaceParameters(..) re-introduce it? Would it better instead to do the sanitization after replaceParameters on line 1065? replaceParameters replaces keys with message values .. so if someone added the combining char as first char of a translation on translatewiki, it could recombine with a >.<br>Bawolff<br>added a comment.<br>Feb 27 2025, 8:45 PM<br>2025-02-27 20:45:25 (UTC+0)<br>Comment Actions<br>I'm having trouble finding details of the normalization algorithm, but experimentally we can gain confidence:<br>Full details in section 3.11 of unicode spec<br>Dchan write<br>Oh yes, that's an important thing to consider. You're right that we're safe: the 4th field of UnicodeData.txt shows U+0338 has Canonical Combining Class 1, which is the lowest possible for a combining character. Therefore no combining character can be moved before U+0338 by the Canonical Ordering Algorithm.<br>Just keep in mind that the canonical composition algorithm (3.11.6)<br>looks at characters besides the one that comes immediately after the starter character. So e.g.<br><\u0301\u0338<br>has NFC form<br>\u226e\u0301<br>ssastry<br>added a comment.<br>Feb 27 2025, 9:01 PM<br>2025-02-27 21:01:19 (UTC+0)<br>Comment Actions<br>In<br>T387130#10586494<br>@ihurbain<br>wrote:<br>Nag on the back of my mind: can we have a potential vector around source ranges?<br>We have things like<br>T346197<br>(and related "bad UTF-8" issues) that hint that we occasionally try to access character ranges that were not the ones we actually wanted to; there might be a way to craft things around these?<br>I think we might need a patch on PHPUtils::safeSubstr in Parsoid to avoid returning a string starting with \u0338? (which we do right now:<br>$ php run.php shell<br>> use Wikimedia\Parsoid\Utils\PHPUtils;<br>> PHPUtils::safeSubstr("\u{0338}aaa", 0, 5);<br>= "̸aaa"<br>Since all of Parsoid's serialization goes through XMLSerializer and Scott patched that to handle escaping in text nodes (plus the HardenNFC pass in core for post-cache transforms), I don't think we need to worry about this.<br>cscott<br>added a comment.<br>Feb 27 2025, 9:08 PM<br>2025-02-27 21:08:52 (UTC+0)<br>Comment Actions<br>In<br>T387130#10588949<br>@ssastry<br>wrote:<br>Since all of Parsoid's serialization goes through XMLSerializer and Scott patched that to handle escaping in text nodes (plus the HardenNFC pass in core for post-cache transforms), I don't think we need to worry about this.<br>It could be an issue if we stripped U+0338 on input, like we do with U+0000 and control characters, but in our latest versions we don't actually remove the U+0338 from the HTML at all, just insist on it being represented as an entity escape so that NFC normalization won't touch it. The character offsets are all on the "real" string values of DOM Nodes, ie after entity decoding, so adding extra entities doesn't affect them at all.<br>cscott<br>added a comment.<br>Feb 27 2025, 9:25 PM<br>2025-02-27 21:25:37 (UTC+0)<br>Comment Actions<br>In<br>T387130#10588754<br>@ssastry<br>wrote:<br>One more question about Message.php changes: you added the escaping on line 1060 in the format(..) function. why is it not required on other code paths? And could replaceParameters(..) re-introduce it? Would it better instead to do the sanitization after replaceParameters on line 1065? replaceParameters replaces keys with message values .. so if someone added the combining char as first char of a translation on translatewiki, it could recombine with a >.<br>I added U+0338 escaping to every place in Message.php which had a call to<br>htmlspecialchars<br>, excluding only two places which were generating log warnings.<br>::replaceParameters<br>is the wrong place to do escaping (in general) because that leads to double-escaping issues -- in particular, you can have a 'raw' parameter type (<br>Message::rawParams()<br>) which is explicitly supposed to bypass the escaping of the main message, which is how display title and some other things work (<br>T343994<br>).<br>There are also "before" and "after" replacement parameter types. So for a "before" parameter type (like<br>num<br>), the value is replaced *before* parsing (<br>Message::format() line 1046), and then the entire thing (parameter and all) will get HTML escaped in<br>Message::format()<br>line 1049 and following, either by the full Parser or by the<br>htmlspecialchars+escapeCombiningChars` call there. For "after" parameter types (like<br>raw<br>), the value is replaced on line 1064 after the rest of the message is escaped, but the replaced value is formatted by passing<br>$format<br>through to<br>extractParam<br>. So (eg) if there's a recursive Message being substituted, it is replaced "after" HTML escaping is done, but the parameter value is itself HTML escaped before being inserted.<br>So for<br>RawMessage('$1', [ new RawMessage("\u{0338}") ])<br>it is true that the parameter is inserted after the call to htmlspecialchars, as you noted, but the fact is that the parameter message is itself formatted (to<br≯<br>) before it is inserted.<br>ssastry<br>added a comment.<br>Feb 27 2025, 9:41 PM<br>2025-02-27 21:41:45 (UTC+0)<br>Comment Actions<br>In<br>T387130#10588992<br>@cscott<br>wrote:<br>In<br>T387130#10588754<br>@ssastry<br>wrote:<br>One more question about Message.php changes: you added the escaping on line 1060 in the format(..) function. why is it not required on other code paths? And could replaceParameters(..) re-introduce it? Would it better instead to do the sanitization after replaceParameters on line 1065? replaceParameters replaces keys with message values .. so if someone added the combining char as first char of a translation on translatewiki, it could recombine with a >.<br>...<br>So for<br>RawMessage('$1', [ new RawMessage("\u{0338}") ])<br>it is true that the parameter is inserted after the call to htmlspecialchars, as you noted, but the fact is that the parameter message is itself formatted (to<br≯<br>) before it is inserted.<br>Ah, this is the part I missed .. I didn't dig one level down where extractParam handles the escaping. Ok, LGTM.<br>cscott<br>added a subscriber:<br>Ladsgroup<br>Feb 27 2025, 9:58 PM<br>2025-02-27 21:58:24 (UTC+0)<br>gerritbot<br>added a comment.<br>Feb 27 2025, 10:43 PM<br>2025-02-27 22:43:02 (UTC+0)<br>Comment Actions<br>Change #1123486 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):<br>[mediawiki/vendor@master] Bump wikimedia/parsoid to v0.21.0-a18<br>gerritbot<br>added a project:<br>Patch-For-Review<br>Feb 27 2025, 10:43 PM<br>2025-02-27 22:43:04 (UTC+0)<br>gerritbot<br>added a comment.<br>Feb 27 2025, 10:45 PM<br>2025-02-27 22:45:12 (UTC+0)<br>Comment Actions<br>Change #1123488 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):<br>[mediawiki/core@master] Bump wikimedia/parsoid to 0.21.0-a18<br>sbassett<br>added a comment.<br>Feb 27 2025, 10:47 PM<br>2025-02-27 22:47:32 (UTC+0)<br>Comment Actions<br>Core patch from<br>T387130#10588539<br>has been deployed<br>. Seems stable and has been tested by<br>@cscott<br>and<br>@ssastry<br>sbassett<br>added a parent task:<br>Restricted Task<br>Feb 27 2025, 10:49 PM<br>2025-02-27 22:49:18 (UTC+0)<br>sbassett<br>added a comment.<br>Feb 27 2025, 11:53 PM<br>2025-02-27 23:53:45 (UTC+0)<br>Comment Actions<br>Backport deployments of<br>c1123486<br>and<br>c1123488<br>have been successfully completed<br>. Thanks everyone. I know there are several code-hardening patches to follow, but I believe that should get us mitigated in Wikimedia production.<br>DLynch<br>awarded a token.<br>Feb 27 2025, 11:55 PM<br>2025-02-27 23:55:04 (UTC+0)<br>acooper<br>subscribed.<br>Feb 28 2025, 12:07 AM<br>2025-02-28 00:07:04 (UTC+0)<br>Bawolff<br>added a comment.<br>Edited<br>Feb 28 2025, 12:39 AM<br>2025-02-28 00:39:20 (UTC+0)<br>Comment Actions<br>In<br>T387130#10589389<br>@sbassett<br>wrote:<br>Backport deployments of<br>c1123486<br>and<br>c1123488<br>have been successfully completed<br>. Thanks everyone. I know there are several code-hardening patches to follow, but I believe that should get us mitigated in Wikimedia production.<br>I think that covers the visual editor case but not the other cases like category tree i mentioned in an earlier comment<br>T387130#10577699<br>(this ticket is a bit of a mess to follow)<br>ssastry<br>added a subscriber:<br>HCoplin-WMF<br>Feb 28 2025, 4:30 PM<br>2025-02-28 16:30:14 (UTC+0)<br>sbassett<br>added a comment.<br>Feb 28 2025, 6:47 PM<br>2025-02-28 18:47:25 (UTC+0)<br>Comment Actions<br>In<br>T387130#10589510<br>@Bawolff<br>wrote:<br>I think that covers the visual editor case but not the other cases like category tree i mentioned in an earlier comment<br>T387130#10577699<br>(this ticket is a bit of a mess to follow)<br>Ah, ok, I guess we should compile a list of the patches that still warrant discrete Wikimedia production deployments. I assume the patch from<br>T387130#10577699<br>does, since CategoryTree is both production-deployed and bundled...<br>cscott<br>added a comment.<br>Edited<br>Feb 28 2025, 6:56 PM<br>2025-02-28 18:56:21 (UTC+0)<br>Comment Actions<br>I looked at Category Tree and I believe this does actually mitigate that, since the attack Html in the Category Tree example was being routed through the Html::* helper classes, which were patched and hardened.<br>@Bawolff<br>could you double check that?<br>My analysis: CategoryTree vulernability is because client-side JS in<br>CategoryTree:modules/ext.categoryTree/ext.categoryTree.js<br>does:<br>new mw.Api().get( {<br>action: 'categorytree',<br>category: ctTitle,<br>options: ctOptions,<br>uselang: mw.config.get( 'wgUserLanguage' ),<br>formatversion: 2<br>} ).done( ( data ) => {<br>data = data.categorytree.html;</p> <p>let $data;<br>if ( data === '' ) {<br>$data = $( '<i>' ).addClass( 'CategoryTreeNotice' )<br>// eslint-disable-next-line mediawiki/msg-doc<br>.text( mw.msg( {<br>0: 'categorytree-no-subcategories',<br>10: 'categorytree-no-pages',<br>100: 'categorytree-no-parent-categories'<br>}[ mode ] || 'categorytree-nothing-found' ) );<br>} else {<br>$data = $( $.parseHTML( data ) );<br>attachHandler( $data );</p> <p>$children.empty().append( $data );<br>} ).fail( error );<br>In particular, parsing the HTML retrieved from a call to the category tree action API. That HTML comes from<br>CategoryTree:includes/ApiCategoryTree.php<br>in the<br>getHTML<br>function, which calls<br>CategoryTree::renderChildren<br>. This is where it gets a little hairy, but<br>CategoryTree:include/CategoryTree.php<br>renderChildren<br>calls<br>renderNodeInfo<br>, and that method appears to use<br>Html::rawElement<br>and<br>Html::element<br>to create its HTML, which should be hardened. There are a few instances of:<br>$s = Html::openElement(....);<br>$s .= ...;<br>$s .= Html::closeElement(...);<br>however, which *could* cause a bad character to sneak in. But the key places where links are rendered appear to use the safe forms of<br>Html::*<br>and/or<br>LinkRenderer::makeLink<br>which goes through<br>LinkRenderer::buildAElement<br>which uses<br>Html::rawElement('a'...)<br>and thus should be safe.<br>Of course I could have missed something, so double-checking requested please!<br>Bawolff<br>added a comment.<br>Feb 28 2025, 7:33 PM<br>2025-02-28 19:33:51 (UTC+0)<br>Comment Actions<br>Sorry, my bad. I got confused reading the backscroll and thought it was only the parsoid patches that got deployed.<br>I agree that the patch to Html:: should be sufficient for categorytree issue.<br>cscott<br>added a comment.<br>Feb 28 2025, 7:41 PM<br>2025-02-28 19:41:01 (UTC+0)<br>Comment Actions<br>My proposal is to keep this task quiet for another week, which will give<br>@ABreault-WMF<br>time to apply these same two fixes to the stable releases (1.43, LTS) and make security releases for them. After which we should be ok to push patches for this issue in public and probably fork off a bunch of specific "hardening" tasks. For example, although as mentioned above the fact that LinkRenderer::makeLink() has been hardened ought to mitigate any issue with titles containing U+0338, it still is somewhat concerning to me that<br>exists (that U+0338 as a title!) and I feel like we ought to eventually deploy the utfnormal fix to prevent that (it would rename the page to U+25CC U+0338 which would display better as well). But we can handle that as a separate related task, since it would involve running the cleanupTitles script etc. (But luckily this is the only title containing U+0338 which<br>turns up.)<br>sbassett<br>added a comment.<br>Mar 4 2025, 3:26 PM<br>2025-03-04 15:26:31 (UTC+0)<br>Comment Actions<br>^ Sounds good,<br>@cscott<br>. Just to clarify: you did clear/purge the necessary page caches for U+0338 (I believe we confirmed that in Slack). It sounds like Wikimedia production should be well-patched at this point. Normally we'd keep something like the core patch protected until the core security release later this quarter, but I'd imagine that touches fairly volatile code and that we don't want to deal with merge conflicts every week for the deployed patch.<br>gerritbot<br>added a comment.<br>Mar 6 2025, 1:50 AM<br>2025-03-06 01:50:34 (UTC+0)<br>Comment Actions<br>Change #1124909 had a related patch set uploaded (by Arlolra; author: Arlolra):<br>[mediawiki/vendor@REL1_39] Update wikimedia/parsoid to 0.16.5<br>gerritbot<br>added a comment.<br>Mar 6 2025, 2:29 AM<br>2025-03-06 02:29:43 (UTC+0)<br>Comment Actions<br>Change #1124915 had a related patch set uploaded (by Arlolra; author: Arlolra):<br>[mediawiki/vendor@REL1_42] Update wikimedia/parsoid to 0.19.2<br>gerritbot<br>added a comment.<br>Mar 6 2025, 2:55 AM<br>2025-03-06 02:55:54 (UTC+0)<br>Comment Actions<br>Change #1124919 had a related patch set uploaded (by Arlolra; author: Arlolra):<br>[mediawiki/vendor@REL1_43] Update wikimedia/parsoid to 0.20.2<br>gerritbot<br>added a comment.<br>Mar 6 2025, 5:26 PM<br>2025-03-06 17:26:21 (UTC+0)<br>Comment Actions<br>Change #1124919<br>merged<br>by Subramanya Sastry:<br>[mediawiki/vendor@REL1_43] Update wikimedia/parsoid to 0.20.2<br>gerritbot<br>added a comment.<br>Mar 6 2025, 5:26 PM<br>2025-03-06 17:26:32 (UTC+0)<br>Comment Actions<br>Change #1124915<br>merged<br>by Subramanya Sastry:<br>[mediawiki/vendor@REL1_42] Update wikimedia/parsoid to 0.19.2<br>gerritbot<br>added a comment.<br>Mar 6 2025, 5:26 PM<br>2025-03-06 17:26:39 (UTC+0)<br>Comment Actions<br>Change #1124909<br>merged<br>by Subramanya Sastry:<br>[mediawiki/vendor@REL1_39] Update wikimedia/parsoid to 0.16.5<br>ABreault-WMF<br>added a comment.<br>Mar 6 2025, 8:18 PM<br>2025-03-06 20:18:40 (UTC+0)<br>Comment Actions<br>Here's a backport for the core patch in<br>T387130#10588539<br>for REL1_43<br>REL1_43-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>17 KB<br>It applied pretty cleanly, other than<br>includes/Message/Message.php<br>having been moved to<br>includes/language/Message/Message.php<br>in 1.44<br>ABreault-WMF<br>added a comment.<br>Mar 6 2025, 10:02 PM<br>2025-03-06 22:02:29 (UTC+0)<br>Comment Actions<br>And a backport for the core patch in<br>T387130#10588539<br>for REL1_42<br>REL1_42-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>17 KB<br>It applied cleanly except for these files,<br>autoload.php<br>includes/Html/Html.php<br>includes/Message/Message.php<br>includes/OutputTransform/DefaultOutputPipelineFactory.php<br>They were all trivial differences except for<br>DefaultOutputPipelineFactory<br>which needed accounting for the changes in<br>T363764<br>Running the test<br>HardenNFCTest.php<br>also required a namespace change to<br>ParserOptions<br>in<br>HardenNFC.php<br>ABreault-WMF<br>added a comment.<br>Mar 7 2025, 3:39 AM<br>2025-03-07 03:39:03 (UTC+0)<br>Comment Actions<br>Finally, a backport for the core patch in<br>T387130#10588539<br>for REL1_39<br>REL1_39-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>11 KB<br>The patch mostly applied cleanly, a few files like<br>Message.php<br>and<br>Html.php<br>were found in different directories.<br>No changes to<br>autoload.php<br>were needed since the new files weren't added.<br>HtmlHelperTrait.php<br>didn't exist in 1.39 so those changes were dropped.<br>The OutputTransform pipeline didn't exist in 1.39 so the changes to<br>DefaultOutputPipelineFactory.php<br>and<br>HardenNFC.php<br>were dropped. However, as the commit message says, the text transform of<br>HardenNFC<br>is moved into<br>ParserOutput::getText()<br>The<br>RemexCompatFormatter.php<br>constructor was lightly commented in a patch from<br>T354361<br>so those changes were retained when applying the patch to that file.<br>The<br>html/php<br>parsertests section in<br>badCharacters.txt<br>was updated to reflect that in 1.39 the<br>wgParserEnableLegacyHeadingDOM<br>config has no effect, the legacy output is always generated.<br>sbassett<br>added a subscriber:<br>Reedy<br>Mar 7 2025, 3:28 PM<br>2025-03-07 15:28:09 (UTC+0)<br>Comment Actions<br>@ABreault-WMF<br>- Thanks for the patches. I'll leave it to<br>@Reedy<br>whether he wants to push those up now or wait for the proper MW release (<br>T382316<br>).<br>cscott<br>added a subscriber:<br>MSantos<br>Mar 10 2025, 3:46 PM<br>2025-03-10 15:46:23 (UTC+0)<br>cscott<br>added a subscriber:<br>Jgiannelos<br>Mar 10 2025, 3:51 PM<br>2025-03-10 15:51:03 (UTC+0)<br>cscott<br>added a project:<br>Content-Transform-Team (Work In Progress)<br>Mar 10 2025, 3:59 PM<br>2025-03-10 15:59:08 (UTC+0)<br>Reedy<br>added a comment.<br>Mar 24 2025, 12:59 PM<br>2025-03-24 12:59:18 (UTC+0)<br>Comment Actions<br>Thanks for the backports!<br>Do we know offhand how long this has been around for?<br>MSantos<br>added a project:<br>Essential-Work<br>Mar 24 2025, 2:56 PM<br>2025-03-24 14:56:26 (UTC+0)<br>Reedy<br>closed this task as<br>Resolved<br>Mar 24 2025, 3:04 PM<br>2025-03-24 15:04:21 (UTC+0)<br>Comment Actions<br>Marking as resolved to make it more obvious for me tracking.<br>Bawolff<br>added a comment.<br>Mar 24 2025, 9:30 PM<br>2025-03-24 21:30:39 (UTC+0)<br>Comment Actions<br>In<br>T387130#10667775<br>@Reedy<br>wrote:<br>Thanks for the backports!<br>Do we know offhand how long this has been around for?<br>Since<br>jan 14 2009<br>The behaviour in unicode is from 1993 afaict<br>dchan<br>added a comment.<br>Mar 25 2025, 5:32 PM<br>2025-03-25 17:32:24 (UTC+0)<br>Comment Actions<br>Are we good to make this task publicly visible? Any objections?<br>Reedy<br>added a comment.<br>Mar 25 2025, 5:36 PM<br>2025-03-25 17:36:35 (UTC+0)<br>Comment Actions<br>Please wait for the security release to go out... It should be this week, but with other stuff going on, it may have to wait till next week...<br>There's still patches that only exist on this task.<br>Reedy<br>renamed this task from<br>Potential javascript injection attack enabled by Unicode normalization in Action API<br>to<br>CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API<br>Apr 9 2025, 12:58 PM<br>2025-04-09 12:58:14 (UTC+0)<br>Reedy<br>removed a project:<br>Patch-For-Review<br>Reedy<br>added a comment.<br>Apr 9 2025, 2:30 PM<br>2025-04-09 14:30:00 (UTC+0)<br>Comment Actions<br>In<br>T387130#10588539<br>@cscott<br>wrote:<br>@ssastry<br>Thanks for the catch! Updated core patch, plus Message tests:<br>0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>17 KB<br>So this patch didn't apply on master, so I looked what was in deployment...<br>reedy@Sams-Mac-mini Downloads % md5sum 02-T387130.patch<br>fb81b166a6556deca5f06906add9c309 02-T387130.patch<br>reedy@Sams-Mac-mini Downloads % md5sum master-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>9b140d5ee71aa62fdd9b4483230a4a07 master-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch<br>That current version is<br>02-T387130.patch<br>17 KB<br>However, this also doesn't apply on master..<br>$ git am ~/02-T387130.patch<br>Applying: SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>error: patch failed: includes/language/Message/Message.php:1019<br>error: includes/language/Message/Message.php: patch does not apply<br>Patch failed at<br>0001<br>SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>hint: Use<br>'git am --show-current-patch=diff'<br>to see the failed patch<br>When you have resolved this problem, run<br>"git am --continue"<br>If you prefer to skip this patch, run<br>"git am --skip"<br>instead.<br>But does with a<br>-3<br>, so the updated version is going to be<br>02-T387130-2.patch<br>17 KB<br>gerritbot<br>added a comment.<br>Apr 10 2025, 4:27 PM<br>2025-04-10 16:27:19 (UTC+0)<br>Comment Actions<br>Change #1135770 had a related patch set uploaded (by Reedy; author: C. Scott Ananian):<br>[mediawiki/core@REL1_39] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>gerritbot<br>added a project:<br>Patch-For-Review<br>Apr 10 2025, 4:27 PM<br>2025-04-10 16:27:22 (UTC+0)<br>Comment Actions<br>Change #1135775 had a related patch set uploaded (by Reedy; author: C. Scott Ananian):<br>[mediawiki/core@REL1_42] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>gerritbot<br>added a comment.<br>Apr 10 2025, 4:36 PM<br>2025-04-10 16:36:45 (UTC+0)<br>Comment Actions<br>Change #1135783 had a related patch set uploaded (by Reedy; author: C. Scott Ananian):<br>[mediawiki/core@REL1_43] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>gerritbot<br>added a comment.<br>Apr 10 2025, 4:52 PM<br>2025-04-10 16:52:15 (UTC+0)<br>Comment Actions<br>Change #1135770<br>merged<br>by jenkins-bot:<br>[mediawiki/core@REL1_39] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>gerritbot<br>added a comment.<br>Apr 10 2025, 5:00 PM<br>2025-04-10 17:00:17 (UTC+0)<br>Comment Actions<br>Change #1135794 had a related patch set uploaded (by Reedy; author: C. Scott Ananian):<br>[mediawiki/core@master] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>gerritbot<br>added a comment.<br>Apr 10 2025, 5:10 PM<br>2025-04-10 17:10:03 (UTC+0)<br>Comment Actions<br>Change #1135783<br>merged<br>by jenkins-bot:<br>[mediawiki/core@REL1_43] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>gerritbot<br>added a comment.<br>Apr 10 2025, 5:24 PM<br>2025-04-10 17:24:28 (UTC+0)<br>Comment Actions<br>Change #1135794<br>merged<br>by jenkins-bot:<br>[mediawiki/core@master] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>gerritbot<br>added a comment.<br>Apr 10 2025, 5:50 PM<br>2025-04-10 17:50:50 (UTC+0)<br>Comment Actions<br>Change #1135775<br>merged<br>by jenkins-bot:<br>[mediawiki/core@REL1_42] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization<br>dchan<br>added a comment.<br>May 7 2025, 4:42 PM<br>2025-05-07 16:42:16 (UTC+0)<br>Comment Actions<br>Looks like now we're good to make this task publicly visible? Any objections remaining?<br>sbassett<br>assigned this task to<br>cscott<br>May 7 2025, 5:27 PM<br>2025-05-07 17:27:08 (UTC+0)<br>sbassett<br>changed the visibility from "<br>Custom Policy<br>" to "Public (No Login Required)".<br>sbassett<br>changed the edit policy from "<br>Custom Policy<br>" to "All Users".<br>sbassett<br>removed a project:<br>Patch-For-Review<br>Maintenance_bot<br>added a project:<br>MW-Interfaces-Team<br>May 7 2025, 5:30 PM<br>2025-05-07 17:30:20 (UTC+0)<br>zoe<br>mentioned this in<br>T382756: Error When Editing Pages with Specific Unicode Character in Visual Editor<br>May 7 2025, 6:18 PM<br>2025-05-07 18:18:24 (UTC+0)<br>andrea.denisse<br>awarded a token.<br>May 16 2025, 7:00 PM<br>2025-05-16 19:00:51 (UTC+0)<br>Raine<br>awarded a token.<br>May 19 2025, 11:31 AM<br>2025-05-19 11:31:09 (UTC+0)<br>CDanis<br>awarded a token.<br>May 19 2025, 2:35 PM<br>2025-05-19 14:35:37 (UTC+0)<br>Daimona<br>awarded a token.<br>May 23 2025, 5:31 PM<br>2025-05-23 17:31:07 (UTC+0)<br>gerritbot<br>added a comment.<br>Jun 24 2025, 9:32 PM<br>2025-06-24 21:32:24 (UTC+0)<br>Comment Actions<br>Change #1163476 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):<br>[utfnormal@master] Replace isolated combining characters<br>gerritbot<br>added a project:<br>Patch-For-Review<br>Jun 24 2025, 9:32 PM<br>2025-06-24 21:32:25 (UTC+0)<br>gerritbot<br>added a comment.<br>Jul 28 2025, 4:49 PM<br>2025-07-28 16:49:31 (UTC+0)<br>Comment Actions<br>Change #1163476<br>merged<br>by jenkins-bot:<br>[utfnormal@master] Replace isolated combining characters<br>Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.<br>Wikimedia Foundation<br>Code of Conduct<br>Disclaimer<br>CC-BY-SA<br>GPL<br>Credits</p></div> </div> <div class="detail-actions"> <a href="/search?q=phabricator.wikimedia.org" class="btn">Same domain →</a> <a href="/search?q=%E2%9A%93%20T387130%20CVE-2025-32699%3A%20Pote" class="btn btn-secondary">Similar titles →</a> </div> </article> </main> <footer class="site-footer"> <div class="container"> <p>C U Cyber History — Public Interest Web Archive</p> <p class="footer-small">Preserving fading web memories. Discover history that once existed.</p> </div> </footer> <script id="chat-i18n-en" type="application/json">{"button_label":"Need Help?","placeholder":"Ask us anything...","title":"CUCH Assistant","subtitle":"How can we help you?","send":"Send","close":"Close","folder":"/var/www/cu","greeting":"Hi! Welcome to CUCH.org. How can I help you today? Feel free to ask about our archive, search, or anything else!","error":"Sorry, our service is temporarily unavailable. Please try again later.","banner_text":"Need help? Ask our AI assistant!"}</script> <script src="/static/js/chat-widget.js"></script> </body> </html>