⚓ T387130 CVE-2025-32699: Potential java

⚓ T387130 CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API
Page Menu
Phabricator
Create Task
Maniphest
T387130
CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API
Closed, Resolved
Public
Security
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
cscott
Authored By
zoe
Feb 24 2025, 3:30 PM
2025-02-24 15:30:38 (UTC+0)
Tags
Security-Team
(In Progress)
Security
MediaWiki-Action-API
(Unsorted)
Vuln-XSS
Vuln-Inject
(Tracked)
SecTeam-Processed
(Completed)
Content-Transform-Team (Work In Progress)
(Backlog)
Essential-Work
MW-Interfaces-Team
(Incoming (Needs Triage))
Patch-For-Review
Referenced Files
F59025694: 02-T387130-2.patch
Apr 9 2025, 2:30 PM
2025-04-09 14:30:00 (UTC+0)
F59025660: 02-T387130.patch
Apr 9 2025, 2:30 PM
2025-04-09 14:30:00 (UTC+0)
F58631089: REL1_39-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Mar 7 2025, 3:39 AM
2025-03-07 03:39:03 (UTC+0)
F58626453: REL1_42-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Mar 6 2025, 10:02 PM
2025-03-06 22:02:29 (UTC+0)
F58625014: REL1_43-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Mar 6 2025, 8:18 PM
2025-03-06 20:18:40 (UTC+0)
F58513634: 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Feb 27 2025, 6:32 PM
2025-02-27 18:32:36 (UTC+0)
F58513114: 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch
Feb 27 2025, 5:09 PM
2025-02-27 17:09:53 (UTC+0)
F58511482: 0001-Avoid-attacks-in-API-related-to-unicode-NFC-and-deco.patch
Feb 27 2025, 11:07 AM
2025-02-27 11:07:04 (UTC+0)
View All 17 Files
Subscribers
ABreault-WMF
acooper
Aklapper
Bawolff
cscott
dchan
DLynch
View All 20 Subscribers
Description
TL;DR
The MediaWiki Action API converts output to Unicode Normalization Form C. Unfortunately, for HTML strings this is unsafe, because the sequence ‘
’ + U+0338 gets replaced by U+226F, breaking the tag end and potentially allowing injection attacks.
Steps to reproduce
Visit any wiki page, such that
mw.Api()
has loaded
Open the console
Perform any mw.Api call that generates HTML, such that you can make the first character inside a tag be
U+0338 COMBINING LONG SOLIDUS OVERLAY
In other words, you want the API to send you some HTML that contains ‘
’ followed by U+0338.
For example, call
{ action: 'visualeditor', paction: 'parsefragment' }
as follows:
const
COMBINING_LONG_SOLIDUS
'\u0338'
new
mw
Api
().
post
action
'visualeditor'
paction
'parsefragment'
page
'Test'
wikitext
COMBINING_LONG_SOLIDUS
' onmouseover="alert(42)" >content'
).
done
data
=>
const
content
data
visualeditor
content
document
body
innerHTML
content
console
log
'Content:'
content
);
).
fail
err
=>
console
error
err
);
This can also be reproduced without JavaScript – note the missing
after
id="mwAg"
curl -s -d
action
visualeditor -d
paction
parsefragment -d
page
Test -d
wikitext
$'\u0338 onmouseover="alert(42)" >content '
-d
format
json https://en.wikipedia.org/w/api.php
jq -r .visualeditor.content
hexdump -C
00000000 3c 70 20 69 64 3d 22 6d 77 41 67 22 e2 89 af 20 |

.|
00000036
Compare this with a normal slash in the input:
curl -s -d
action
visualeditor -d
paction
parsefragment -d
page
Test -d
wikitext
'/ onmouseover="alert(42)" >content '
-d
format
json https://en.wikipedia.org/w/api.php
jq -r .visualeditor.content
hexdump -C
00000000 3c 70 20 69 64 3d 22 6d 77 41 67 22 3e 2f 20 6f |

.|
00000035
Actual behaviour
The sequence ‘
’ + U+0338 gets replaced with the combined character U+226F
≯ NOT GREATER THAN
. This is due to applying Unicode Normalization Form C. But (surprisingly!) that breaks the HTML tag, which potentially allows a Javascript injection attack, for instance:
'

content

'
Expected behaviour
The HTML arrives with the sequence ‘
’ + U+0338 intact.
'

\u0338 onmouseover="alert(42)" >content

'
Note that this is a regular '
' symbol closing the HTML tag, followed by a U+0338
◌̸ COMBINING LONG SOLIDUS OVERLAY
, such that the contents of the

tag are
"\u0338 onclick="alert(42)" >content"
Debugging note
Both expected and actual output may look similar or identical when rendered in the console:

content

# actual

̸ onmouseover="alert(42)" >content

# expected
The best way to see for sure what’s there is to escape non-ASCII characters with a function:
function
showUnicode
text
return
text
replace
/[^\x00-\x7F]/g
ch
=>
'\\u'
ch
charCodeAt
).
toString
16
).
padStart
'0'
);
text
'

content

'
console
log
showUnicode
text
);
//

content

Cause
The cause was identified by
@dchan
in the
related ticket
. The function
MediaWiki::Api::ApiResult::validateValue
is not just catching invalid UTF-8. It is also applying Unicode Normalization Form C. Unfortunately, as we have seen, it is unsafe to do this on HTML strings if they might contain ‘
’ + U+0338.
MediaWiki::Api::ApiResult::addValue
MediaWiki::Api::ApiResult::validateValue
MediaWiki::Language::normalize
UtfNormal::Validator::cleanUp
normalizer_normalize( $string, Normalizer::FORM_C )
Details
Risk Rating
High
Author Affiliation
WMF Product
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Replace isolated combining characters
utfnormal
master
+134
-14
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
master
+116
-9
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
REL1_42
+115
-10
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
REL1_43
+116
-9
SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
mediawiki/core
REL1_39
+47
-10
Update wikimedia/parsoid to 0.16.5
mediawiki/vendor
REL1_39
+81
-64
Update wikimedia/parsoid to 0.19.2
mediawiki/vendor
REL1_42
+37
-47
Update wikimedia/parsoid to 0.20.2
mediawiki/vendor
REL1_43
+32
-30
Customize query in gerrit
Related Objects
Search...
Task Graph
Mentions
Status
Subtype
Assigned
Task
Resolved
None
T382316
Release MediaWiki 1.39.12/1.42.6/1.43.1
Restricted Task
Resolved
Security
cscott
T387130
CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API
Mentioned In
T382756: Error When Editing Pages with Specific Unicode Character in Visual Editor
Mentioned Here
T382316: Release MediaWiki 1.39.12/1.42.6/1.43.1
T354361: HtmlHelper::modifyElements(…, $html5format = false) incorrectly encodes HTML entities
T363764: Refactor dependency injection (DI) in OutputTransform stages
T343994: OutputPage::setPageTitle() should not accept Message objects, introduce OutputPage::setPageTitleMsg()
T266140: HTML entity replaced by the Unicode character in an edit
T346197: Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
T17261: Trimmed multibyte characters result in invalid XML
T18262: File metadata containing invalid characters produce bad-formed XML
T324431: Parsoid: displaytitle HTML now appearing in element rather than page title Event Timeline There are a very large number of changes, so older changes are hidden. Show Older Changes ihurbain added a comment. Feb 27 2025, 9:29 AM 2025-02-27 09:29:47 (UTC+0) Comment Actions @cscott I think you haven't attached the OutputTransform patches correctly, here I can only see their file names dchan added a comment. Feb 27 2025, 10:26 AM 2025-02-27 10:26:24 (UTC+0) Comment Actions In T387130#10585557 @cscott wrote: Here is a patch for 1b from above ( T387130#10584304 ). This patches wikimedia/utfnormal to prefix isolated combining characters, using the unicode database [...] 0004-Replace-isolated-combining-characters.patch 134 KB $canPrecedeCombining []; [...] public function testU0338 () $text \u {0338}< \u {0338}> \u {0338}" $expect \u {25CC} \u {0338} \u {226E} \u {226F}" $this -> assertEquals bin2hex $expect ), bin2hex Validator :: cleanUp $text ); Please can we remove ' ' and ' ' from $canPrecedeCombining ? that would render html invulnerable, without breaking any combining sequence except ">\x{0338}" and "<\x{0338}" — both of which have precomposed alternatives and are bad things to be floating around our ecosystem. Also that way I don't have to fix anything in ApiVisualEditor.php 😁 Then the correct test would be: $text \u {0338}< \u {0338}> \u {0338}" $expect \u {25CC} \u {0338}< \u {25CC} \u {0338}> \u {25CC} \u {0338}" Bawolff added a comment. Edited Feb 27 2025, 10:28 AM 2025-02-27 10:28:35 (UTC+0) Comment Actions Yes, I agree that would be better. What I don't know is why we choose to normalize html (or wikitext) to NFC there. Is that a step we actually need? I looked it up. It was to fix T17261 and T18262 r45749 That said, it does seem like it only really needs valid unicode not normalized unicode. ihurbain added a comment. Edited Feb 27 2025, 10:58 AM 2025-02-27 10:58:05 (UTC+0) Comment Actions Nag on the back of my mind: can we have a potential vector around source ranges? We have things like T346197 (and related "bad UTF-8" issues) that hint that we occasionally try to access character ranges that were not the ones we actually wanted to; there might be a way to craft things around these? I think we might need a patch on PHPUtils::safeSubstr in Parsoid to avoid returning a string starting with \u0338? (which we do right now: $ php run.php shell > use Wikimedia\Parsoid\Utils\PHPUtils; > PHPUtils::safeSubstr("\u{0338}aaa", 0, 5); = "̸aaa" dchan added a comment. Feb 27 2025, 10:59 AM 2025-02-27 10:59:18 (UTC+0) Comment Actions In T387130#10585603 @Bawolff wrote: The requirements are: combining slash is sometimes valid, so we cant outright ban it < followed by combining slash will almost always be malicious in output because most keyboards output the precomposed form and we also run NFC on all user input we cant generically replace combining slash with entity at the normalization stage as we dont know if the output is html or not. Yes, very clearly put. However if we have unicode data available, we could insert U+25CC ' ' inside any sequence where '>' is followed by a combining character. I think that's reasonable for a function called cleanUp What if on any output normalization (so excluding normalization of user input in WebRequest), we first count the number of precomposed "not greater than" signs, normalize, and then count again. If the number changes we know an attack is happening since we assume at this point the decomposed not greater than is always malicious. That's an ingenious idea. Maybe we should do it if we end up unconvinced that our coverage is good enough? Bawolff added a comment. Feb 27 2025, 11:07 AM 2025-02-27 11:07:04 (UTC+0) Comment Actions In T387130#10585603 @Bawolff wrote: Thinking about this The requirements are: combining slash is sometimes valid, so we cant outright ban it < followed by combining slash will almost always be malicious in output because most keyboards output the precomposed form and we also run NFC on all user input we cant generically replace combining slash with entity at the normalization stage as we dont know if the output is html or not. What if on any output normalization (so excluding normalization of user input in WebRequest), we first count the number of precomposed "not greater than" signs, normalize, and then count again. If the number changes we know an attack is happening since we assume at this point the decomposed not greater than is always malicious. At this point we throw an exception or maybe go back to the unnormalized string, replace all combining slash with unicode replacement and try again (its ok to be a little lossy here since we assume this code path only happens during an attack). Thoughts? If we do strip these characters, i think it is less confusing to the user to replace them with unicode replacement character then to just silently delete. Attempt at implementing this idea, although maybe it belongs as a method of UTFValidator instead of in API. I only covered as I don't think ≮ is a security risk, but maybe it would make more sense to cover both just in case. 0001-Avoid-attacks-in-API-related-to-unicode-NFC-and-deco.patch 2 KB zoe added a comment. Feb 27 2025, 1:10 PM 2025-02-27 13:10:15 (UTC+0) Comment Actions What if on any output normalization (so excluding normalization of user input in WebRequest), we first count the number of precomposed "not greater than" signs, normalize, and then count again. If the number changes we know an attack is happening since we assume at this point the decomposed not greater than is always malicious. Headline: I think this would work. We are relying on the properties of the characters, not the NFC algorithm itself. If we can find something that composes with U+003E > and which comes earlier in the canonical ordering algorithm than U+0338 ◌̸ then we could create a string which normalises with fewer copies of and bypass such a check. I'm having trouble finding details of the normalization algorithm, but experimentally we can gain confidence: [... new Array 0xffff )]. map (( => '\u226f' String fromCodePoint )). normalize "NFC" )). filter (( => codePointAt !== 0x226f //[] dchan added a comment. Feb 27 2025, 2:05 PM 2025-02-27 14:05:35 (UTC+0) Comment Actions In T387130#10586700 @zoe wrote: We are relying on the properties of the characters, not the NFC algorithm itself. If we can find something that composes with U+003E > and which comes earlier in the canonical ordering algorithm than U+0338 ◌̸ then we could create a string which normalises with fewer copies of and bypass such a check. I'm having trouble finding details of the normalization algorithm, but experimentally we can gain confidence: [... new Array 0xffff )]. map (( => '\u226f' String fromCodePoint )). normalize "NFC" )). filter (( => codePointAt !== 0x226f //[] Oh yes, that's an important thing to consider. You're right that we're safe: the 4th field of UnicodeData.txt shows U+0338 has Canonical Combining Class 1, which is the lowest possible for a combining character. Therefore no combining character can be moved before U+0338 by the Canonical Ordering Algorithm. 0338;COMBINING LONG SOLIDUS OVERLAY;Mn;1;NSM;;;;;N;NON-SPACING LONG SLASH OVERLAY;;;; cscott added a comment. Feb 27 2025, 3:35 PM 2025-02-27 15:35:36 (UTC+0) Comment Actions In T387130#10586404 @dchan wrote: In T387130#10585557 @cscott wrote: Here is a patch for 1b from above ( T387130#10584304 ). This patches wikimedia/utfnormal to prefix isolated combining characters, using the unicode database [...] 0004-Replace-isolated-combining-characters.patch 134 KB $canPrecedeCombining []; [...] public function testU0338 () $text \u {0338}< \u {0338}> \u {0338}" $expect \u {25CC} \u {0338} \u {226E} \u {226F}" $this -> assertEquals bin2hex $expect ), bin2hex Validator :: cleanUp $text ); Please can we remove ' ' and ' ' from $canPrecedeCombining ? that would render html invulnerable, without breaking any combining sequence except ">\x{0338}" and "<\x{0338}" — both of which have precomposed alternatives and are bad things to be floating around our ecosystem. This code applies *after* NFC normalization has been done. So and will never appear as a preceding character. That's not the point of this code -- the point of this code is to eliminate "hanging" combining characters that are then time bimbs that cause trouble when they are pasted inside an HTML tag. cscott added a comment. Edited Feb 27 2025, 5:09 PM 2025-02-27 17:09:53 (UTC+0) Comment Actions I am (currently) proposing the following two patches: To mediawiki-core: (updated to include Tidy/Html/Message/OutputTransform) 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 15 KB To Parsoid: (unchanged from T387130#10584604 0001-Entity-escape-U-0338-where-needed-to-make-HTML-outpu.patch 8 KB The Parsoid patch will have to be deployed to prod as a patch to the vendor directory. All other patches are additional hardenings which are nice-to-have but shouldn't be on the critical path here. In particular, I think hardening UtfNormal\Validator::cleanUp() is worthwhile, but isn't necessary to stop the injection attack. I have a number of other patches in my tree which shift code to using the Html::* helper classes instead of string concatenation which is again helpful, but unnecessary given a final postprocessing pass to entity-escape all remaining U+0338, which is what the above implements. I also considered a patch to OutputPage::output() to do a final fail-safe against U+0338, but that also doesn't seem strictly necessary as all the attack vectors go through the Action API (which is where NFC normalization is performance) not direct HTML output (which is what OutputPage::output() does). sbassett added a subscriber: gerritbot Feb 27 2025, 5:18 PM 2025-02-27 17:18:50 (UTC+0) Comment Actions In T387130#10588179 @cscott wrote: The Parsoid patch will have to be deployed to prod as a patch to the vendor directory. All other patches are additional hardenings which are nice-to-have but shouldn't be on the critical path here. Ok, once we have all of the relevant patches completed and code-reviewed, we should categorize them as "this patch needs a discrete security deployment to Wikimedia production to fix the active security issues there" and "this patch is code-hardening". The latter should all be able to go through gerrit. And of course Parsoid (and similar) patches can go through gerrit, where we don't have a defined, discrete Wikimedia production security deployment process. ssastry added a comment. Feb 27 2025, 5:45 PM 2025-02-27 17:45:30 (UTC+0) Comment Actions In T387130#10588179 @cscott wrote: I am (currently) proposing the following two patches: To mediawiki-core: (updated to include Tidy/Html/Message/OutputTransform) 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 15 KB My IDE shows that there is a typo in this patch in Message .. "escapeCombiningChars" instead of "escapeCombiningChar". Is it easy to update Message tests to cover format* functions you updated in this patch? dchan added a comment. Feb 27 2025, 6:00 PM 2025-02-27 18:00:41 (UTC+0) Comment Actions In T387130#10587431 @cscott wrote: In T387130#10586404 @dchan wrote: Please can we remove ' ' and ' ' from $canPrecedeCombining ? that would render html invulnerable, without breaking any combining sequence except ">\x{0338}" and "<\x{0338}" — both of which have precomposed alternatives and are bad things to be floating around our ecosystem. This code applies *after* NFC normalization has been done. So and will never appear as a preceding character. That's not the point of this code -- the point of this code is to eliminate "hanging" combining characters that are then time bimbs that cause trouble when they are pasted inside an HTML tag. Oh sorry, I missed that. Then shouldn't we fix U+0338 before NFC normalization? $string self :: prependIsolatedCombining $string ); $norm normalizer_normalize $string Normalizer :: FORM_C ); ... return self :: prependIsolatedCombining self :: NFC $string ); return self :: NFC self :: prependIsolatedCombining $string ); I think we need it, because otherwise MediaWiki::Request::WebRequest::getValues breaks valid HTML / wikitext. For example, right now (even with the above patches), If I use VisualEditor to save a source page containing "FOO\x{0338}BAR" then MediaWiki::Request::WebRequest::getValues mangles it into "FOO<!-- x --\x{226F}BAR" Of course our VisualEditor patch could fix it on our side, but if this affects VisualEditor then it probably affects many other uses too. And the content we're sending isn't obviously invalid. I think if MediaWiki::Request::WebRequest is going to apply NFC normalization, then it's essential that it fix ' ' + U+0338 first. cscott added a comment. Feb 27 2025, 6:22 PM 2025-02-27 18:22:41 (UTC+0) Comment Actions @dchan that's the input side, and T266140 . You need to use the raw API flag from I2e78e660ba1867744e34eda7d00ea527ec016b71 on input. That's not a security bug, though, because the broken HTML is rejected on input. Let's keep this task for output-side issues. cscott added a comment. Feb 27 2025, 6:32 PM 2025-02-27 18:32:36 (UTC+0) Comment Actions @ssastry Thanks for the catch! Updated core patch, plus Message tests: 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 17 KB ssastry added a comment. Feb 27 2025, 7:49 PM 2025-02-27 19:49:47 (UTC+0) Comment Actions One more question about Message.php changes: you added the escaping on line 1060 in the format(..) function. why is it not required on other code paths? And could replaceParameters(..) re-introduce it? Would it better instead to do the sanitization after replaceParameters on line 1065? replaceParameters replaces keys with message values .. so if someone added the combining char as first char of a translation on translatewiki, it could recombine with a >. Bawolff added a comment. Feb 27 2025, 8:45 PM 2025-02-27 20:45:25 (UTC+0) Comment Actions I'm having trouble finding details of the normalization algorithm, but experimentally we can gain confidence: Full details in section 3.11 of unicode spec Dchan write Oh yes, that's an important thing to consider. You're right that we're safe: the 4th field of UnicodeData.txt shows U+0338 has Canonical Combining Class 1, which is the lowest possible for a combining character. Therefore no combining character can be moved before U+0338 by the Canonical Ordering Algorithm. Just keep in mind that the canonical composition algorithm (3.11.6) looks at characters besides the one that comes immediately after the starter character. So e.g. <\u0301\u0338 has NFC form \u226e\u0301 ssastry added a comment. Feb 27 2025, 9:01 PM 2025-02-27 21:01:19 (UTC+0) Comment Actions In T387130#10586494 @ihurbain wrote: Nag on the back of my mind: can we have a potential vector around source ranges? We have things like T346197 (and related "bad UTF-8" issues) that hint that we occasionally try to access character ranges that were not the ones we actually wanted to; there might be a way to craft things around these? I think we might need a patch on PHPUtils::safeSubstr in Parsoid to avoid returning a string starting with \u0338? (which we do right now: $ php run.php shell > use Wikimedia\Parsoid\Utils\PHPUtils; > PHPUtils::safeSubstr("\u{0338}aaa", 0, 5); = "̸aaa" Since all of Parsoid's serialization goes through XMLSerializer and Scott patched that to handle escaping in text nodes (plus the HardenNFC pass in core for post-cache transforms), I don't think we need to worry about this. cscott added a comment. Feb 27 2025, 9:08 PM 2025-02-27 21:08:52 (UTC+0) Comment Actions In T387130#10588949 @ssastry wrote: Since all of Parsoid's serialization goes through XMLSerializer and Scott patched that to handle escaping in text nodes (plus the HardenNFC pass in core for post-cache transforms), I don't think we need to worry about this. It could be an issue if we stripped U+0338 on input, like we do with U+0000 and control characters, but in our latest versions we don't actually remove the U+0338 from the HTML at all, just insist on it being represented as an entity escape so that NFC normalization won't touch it. The character offsets are all on the "real" string values of DOM Nodes, ie after entity decoding, so adding extra entities doesn't affect them at all. cscott added a comment. Feb 27 2025, 9:25 PM 2025-02-27 21:25:37 (UTC+0) Comment Actions In T387130#10588754 @ssastry wrote: One more question about Message.php changes: you added the escaping on line 1060 in the format(..) function. why is it not required on other code paths? And could replaceParameters(..) re-introduce it? Would it better instead to do the sanitization after replaceParameters on line 1065? replaceParameters replaces keys with message values .. so if someone added the combining char as first char of a translation on translatewiki, it could recombine with a >. I added U+0338 escaping to every place in Message.php which had a call to htmlspecialchars , excluding only two places which were generating log warnings. ::replaceParameters is the wrong place to do escaping (in general) because that leads to double-escaping issues -- in particular, you can have a 'raw' parameter type ( Message::rawParams() ) which is explicitly supposed to bypass the escaping of the main message, which is how display title and some other things work ( T343994 ). There are also "before" and "after" replacement parameter types. So for a "before" parameter type (like num ), the value is replaced *before* parsing ( Message::format() line 1046), and then the entire thing (parameter and all) will get HTML escaped in Message::format() line 1049 and following, either by the full Parser or by the htmlspecialchars+escapeCombiningChars` call there. For "after" parameter types (like raw ), the value is replaced on line 1064 after the rest of the message is escaped, but the replaced value is formatted by passing $format through to extractParam . So (eg) if there's a recursive Message being substituted, it is replaced "after" HTML escaping is done, but the parameter value is itself HTML escaped before being inserted. So for RawMessage('$1', [ new RawMessage("\u{0338}") ]) it is true that the parameter is inserted after the call to htmlspecialchars, as you noted, but the fact is that the parameter message is itself formatted (to ̸ ) before it is inserted. ssastry added a comment. Feb 27 2025, 9:41 PM 2025-02-27 21:41:45 (UTC+0) Comment Actions In T387130#10588992 @cscott wrote: In T387130#10588754 @ssastry wrote: One more question about Message.php changes: you added the escaping on line 1060 in the format(..) function. why is it not required on other code paths? And could replaceParameters(..) re-introduce it? Would it better instead to do the sanitization after replaceParameters on line 1065? replaceParameters replaces keys with message values .. so if someone added the combining char as first char of a translation on translatewiki, it could recombine with a >. ... So for RawMessage('$1', [ new RawMessage("\u{0338}") ]) it is true that the parameter is inserted after the call to htmlspecialchars, as you noted, but the fact is that the parameter message is itself formatted (to ̸ ) before it is inserted. Ah, this is the part I missed .. I didn't dig one level down where extractParam handles the escaping. Ok, LGTM. cscott added a subscriber: Ladsgroup Feb 27 2025, 9:58 PM 2025-02-27 21:58:24 (UTC+0) gerritbot added a comment. Feb 27 2025, 10:43 PM 2025-02-27 22:43:02 (UTC+0) Comment Actions Change #1123486 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian): [mediawiki/vendor@master] Bump wikimedia/parsoid to v0.21.0-a18 gerritbot added a project: Patch-For-Review Feb 27 2025, 10:43 PM 2025-02-27 22:43:04 (UTC+0) gerritbot added a comment. Feb 27 2025, 10:45 PM 2025-02-27 22:45:12 (UTC+0) Comment Actions Change #1123488 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian): [mediawiki/core@master] Bump wikimedia/parsoid to 0.21.0-a18 sbassett added a comment. Feb 27 2025, 10:47 PM 2025-02-27 22:47:32 (UTC+0) Comment Actions Core patch from T387130#10588539 has been deployed . Seems stable and has been tested by @cscott and @ssastry sbassett added a parent task: Restricted Task Feb 27 2025, 10:49 PM 2025-02-27 22:49:18 (UTC+0) sbassett added a comment. Feb 27 2025, 11:53 PM 2025-02-27 23:53:45 (UTC+0) Comment Actions Backport deployments of c1123486 and c1123488 have been successfully completed . Thanks everyone. I know there are several code-hardening patches to follow, but I believe that should get us mitigated in Wikimedia production. DLynch awarded a token. Feb 27 2025, 11:55 PM 2025-02-27 23:55:04 (UTC+0) acooper subscribed. Feb 28 2025, 12:07 AM 2025-02-28 00:07:04 (UTC+0) Bawolff added a comment. Edited Feb 28 2025, 12:39 AM 2025-02-28 00:39:20 (UTC+0) Comment Actions In T387130#10589389 @sbassett wrote: Backport deployments of c1123486 and c1123488 have been successfully completed . Thanks everyone. I know there are several code-hardening patches to follow, but I believe that should get us mitigated in Wikimedia production. I think that covers the visual editor case but not the other cases like category tree i mentioned in an earlier comment T387130#10577699 (this ticket is a bit of a mess to follow) ssastry added a subscriber: HCoplin-WMF Feb 28 2025, 4:30 PM 2025-02-28 16:30:14 (UTC+0) sbassett added a comment. Feb 28 2025, 6:47 PM 2025-02-28 18:47:25 (UTC+0) Comment Actions In T387130#10589510 @Bawolff wrote: I think that covers the visual editor case but not the other cases like category tree i mentioned in an earlier comment T387130#10577699 (this ticket is a bit of a mess to follow) Ah, ok, I guess we should compile a list of the patches that still warrant discrete Wikimedia production deployments. I assume the patch from T387130#10577699 does, since CategoryTree is both production-deployed and bundled... cscott added a comment. Edited Feb 28 2025, 6:56 PM 2025-02-28 18:56:21 (UTC+0) Comment Actions I looked at Category Tree and I believe this does actually mitigate that, since the attack Html in the Category Tree example was being routed through the Html::* helper classes, which were patched and hardened. @Bawolff could you double check that? My analysis: CategoryTree vulernability is because client-side JS in CategoryTree:modules/ext.categoryTree/ext.categoryTree.js does: new mw.Api().get( { action: 'categorytree', category: ctTitle, options: ctOptions, uselang: mw.config.get( 'wgUserLanguage' ), formatversion: 2 } ).done( ( data ) => { data = data.categorytree.html; let $data; if ( data === '' ) { $data = $( '' ).addClass( 'CategoryTreeNotice' ) // eslint-disable-next-line mediawiki/msg-doc .text( mw.msg( { 0: 'categorytree-no-subcategories', 10: 'categorytree-no-pages', 100: 'categorytree-no-parent-categories' }[ mode ] || 'categorytree-nothing-found' ) ); } else { $data = $( $.parseHTML( data ) ); attachHandler( $data ); $children.empty().append( $data ); } ).fail( error ); In particular, parsing the HTML retrieved from a call to the category tree action API. That HTML comes from CategoryTree:includes/ApiCategoryTree.php in the getHTML function, which calls CategoryTree::renderChildren . This is where it gets a little hairy, but CategoryTree:include/CategoryTree.php renderChildren calls renderNodeInfo , and that method appears to use Html::rawElement and Html::element to create its HTML, which should be hardened. There are a few instances of: $s = Html::openElement(....); $s .= ...; $s .= Html::closeElement(...); however, which *could* cause a bad character to sneak in. But the key places where links are rendered appear to use the safe forms of Html::* and/or LinkRenderer::makeLink which goes through LinkRenderer::buildAElement which uses Html::rawElement('a'...) and thus should be safe. Of course I could have missed something, so double-checking requested please! Bawolff added a comment. Feb 28 2025, 7:33 PM 2025-02-28 19:33:51 (UTC+0) Comment Actions Sorry, my bad. I got confused reading the backscroll and thought it was only the parsoid patches that got deployed. I agree that the patch to Html:: should be sufficient for categorytree issue. cscott added a comment. Feb 28 2025, 7:41 PM 2025-02-28 19:41:01 (UTC+0) Comment Actions My proposal is to keep this task quiet for another week, which will give @ABreault-WMF time to apply these same two fixes to the stable releases (1.43, LTS) and make security releases for them. After which we should be ok to push patches for this issue in public and probably fork off a bunch of specific "hardening" tasks. For example, although as mentioned above the fact that LinkRenderer::makeLink() has been hardened ought to mitigate any issue with titles containing U+0338, it still is somewhat concerning to me that exists (that U+0338 as a title!) and I feel like we ought to eventually deploy the utfnormal fix to prevent that (it would rename the page to U+25CC U+0338 which would display better as well). But we can handle that as a separate related task, since it would involve running the cleanupTitles script etc. (But luckily this is the only title containing U+0338 which turns up.) sbassett added a comment. Mar 4 2025, 3:26 PM 2025-03-04 15:26:31 (UTC+0) Comment Actions ^ Sounds good, @cscott . Just to clarify: you did clear/purge the necessary page caches for U+0338 (I believe we confirmed that in Slack). It sounds like Wikimedia production should be well-patched at this point. Normally we'd keep something like the core patch protected until the core security release later this quarter, but I'd imagine that touches fairly volatile code and that we don't want to deal with merge conflicts every week for the deployed patch. gerritbot added a comment. Mar 6 2025, 1:50 AM 2025-03-06 01:50:34 (UTC+0) Comment Actions Change #1124909 had a related patch set uploaded (by Arlolra; author: Arlolra): [mediawiki/vendor@REL1_39] Update wikimedia/parsoid to 0.16.5 gerritbot added a comment. Mar 6 2025, 2:29 AM 2025-03-06 02:29:43 (UTC+0) Comment Actions Change #1124915 had a related patch set uploaded (by Arlolra; author: Arlolra): [mediawiki/vendor@REL1_42] Update wikimedia/parsoid to 0.19.2 gerritbot added a comment. Mar 6 2025, 2:55 AM 2025-03-06 02:55:54 (UTC+0) Comment Actions Change #1124919 had a related patch set uploaded (by Arlolra; author: Arlolra): [mediawiki/vendor@REL1_43] Update wikimedia/parsoid to 0.20.2 gerritbot added a comment. Mar 6 2025, 5:26 PM 2025-03-06 17:26:21 (UTC+0) Comment Actions Change #1124919 merged by Subramanya Sastry: [mediawiki/vendor@REL1_43] Update wikimedia/parsoid to 0.20.2 gerritbot added a comment. Mar 6 2025, 5:26 PM 2025-03-06 17:26:32 (UTC+0) Comment Actions Change #1124915 merged by Subramanya Sastry: [mediawiki/vendor@REL1_42] Update wikimedia/parsoid to 0.19.2 gerritbot added a comment. Mar 6 2025, 5:26 PM 2025-03-06 17:26:39 (UTC+0) Comment Actions Change #1124909 merged by Subramanya Sastry: [mediawiki/vendor@REL1_39] Update wikimedia/parsoid to 0.16.5 ABreault-WMF added a comment. Mar 6 2025, 8:18 PM 2025-03-06 20:18:40 (UTC+0) Comment Actions Here's a backport for the core patch in T387130#10588539 for REL1_43 REL1_43-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 17 KB It applied pretty cleanly, other than includes/Message/Message.php having been moved to includes/language/Message/Message.php in 1.44 ABreault-WMF added a comment. Mar 6 2025, 10:02 PM 2025-03-06 22:02:29 (UTC+0) Comment Actions And a backport for the core patch in T387130#10588539 for REL1_42 REL1_42-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 17 KB It applied cleanly except for these files, autoload.php includes/Html/Html.php includes/Message/Message.php includes/OutputTransform/DefaultOutputPipelineFactory.php They were all trivial differences except for DefaultOutputPipelineFactory which needed accounting for the changes in T363764 Running the test HardenNFCTest.php also required a namespace change to ParserOptions in HardenNFC.php ABreault-WMF added a comment. Mar 7 2025, 3:39 AM 2025-03-07 03:39:03 (UTC+0) Comment Actions Finally, a backport for the core patch in T387130#10588539 for REL1_39 REL1_39-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 11 KB The patch mostly applied cleanly, a few files like Message.php and Html.php were found in different directories. No changes to autoload.php were needed since the new files weren't added. HtmlHelperTrait.php didn't exist in 1.39 so those changes were dropped. The OutputTransform pipeline didn't exist in 1.39 so the changes to DefaultOutputPipelineFactory.php and HardenNFC.php were dropped. However, as the commit message says, the text transform of HardenNFC is moved into ParserOutput::getText() The RemexCompatFormatter.php constructor was lightly commented in a patch from T354361 so those changes were retained when applying the patch to that file. The html/php parsertests section in badCharacters.txt was updated to reflect that in 1.39 the wgParserEnableLegacyHeadingDOM config has no effect, the legacy output is always generated. sbassett added a subscriber: Reedy Mar 7 2025, 3:28 PM 2025-03-07 15:28:09 (UTC+0) Comment Actions @ABreault-WMF - Thanks for the patches. I'll leave it to @Reedy whether he wants to push those up now or wait for the proper MW release ( T382316 ). cscott added a subscriber: MSantos Mar 10 2025, 3:46 PM 2025-03-10 15:46:23 (UTC+0) cscott added a subscriber: Jgiannelos Mar 10 2025, 3:51 PM 2025-03-10 15:51:03 (UTC+0) cscott added a project: Content-Transform-Team (Work In Progress) Mar 10 2025, 3:59 PM 2025-03-10 15:59:08 (UTC+0) Reedy added a comment. Mar 24 2025, 12:59 PM 2025-03-24 12:59:18 (UTC+0) Comment Actions Thanks for the backports! Do we know offhand how long this has been around for? MSantos added a project: Essential-Work Mar 24 2025, 2:56 PM 2025-03-24 14:56:26 (UTC+0) Reedy closed this task as Resolved Mar 24 2025, 3:04 PM 2025-03-24 15:04:21 (UTC+0) Comment Actions Marking as resolved to make it more obvious for me tracking. Bawolff added a comment. Mar 24 2025, 9:30 PM 2025-03-24 21:30:39 (UTC+0) Comment Actions In T387130#10667775 @Reedy wrote: Thanks for the backports! Do we know offhand how long this has been around for? Since jan 14 2009 The behaviour in unicode is from 1993 afaict dchan added a comment. Mar 25 2025, 5:32 PM 2025-03-25 17:32:24 (UTC+0) Comment Actions Are we good to make this task publicly visible? Any objections? Reedy added a comment. Mar 25 2025, 5:36 PM 2025-03-25 17:36:35 (UTC+0) Comment Actions Please wait for the security release to go out... It should be this week, but with other stuff going on, it may have to wait till next week... There's still patches that only exist on this task. Reedy renamed this task from Potential javascript injection attack enabled by Unicode normalization in Action API to CVE-2025-32699: Potential javascript injection attack enabled by Unicode normalization in Action API Apr 9 2025, 12:58 PM 2025-04-09 12:58:14 (UTC+0) Reedy removed a project: Patch-For-Review Reedy added a comment. Apr 9 2025, 2:30 PM 2025-04-09 14:30:00 (UTC+0) Comment Actions In T387130#10588539 @cscott wrote: @ssastry Thanks for the catch! Updated core patch, plus Message tests: 0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 17 KB So this patch didn't apply on master, so I looked what was in deployment... reedy@Sams-Mac-mini Downloads % md5sum 02-T387130.patch fb81b166a6556deca5f06906add9c309 02-T387130.patch reedy@Sams-Mac-mini Downloads % md5sum master-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch 9b140d5ee71aa62fdd9b4483230a4a07 master-0001-Ensure-emitted-HTML-is-safe-against-Unicode-NFC-norm.patch That current version is 02-T387130.patch 17 KB However, this also doesn't apply on master.. $ git am ~/02-T387130.patch Applying: SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization error: patch failed: includes/language/Message/Message.php:1019 error: includes/language/Message/Message.php: patch does not apply Patch failed at 0001 SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization hint: Use 'git am --show-current-patch=diff' to see the failed patch When you have resolved this problem, run "git am --continue" If you prefer to skip this patch, run "git am --skip" instead. But does with a -3 , so the updated version is going to be 02-T387130-2.patch 17 KB gerritbot added a comment. Apr 10 2025, 4:27 PM 2025-04-10 16:27:19 (UTC+0) Comment Actions Change #1135770 had a related patch set uploaded (by Reedy; author: C. Scott Ananian): [mediawiki/core@REL1_39] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization gerritbot added a project: Patch-For-Review Apr 10 2025, 4:27 PM 2025-04-10 16:27:22 (UTC+0) Comment Actions Change #1135775 had a related patch set uploaded (by Reedy; author: C. Scott Ananian): [mediawiki/core@REL1_42] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization gerritbot added a comment. Apr 10 2025, 4:36 PM 2025-04-10 16:36:45 (UTC+0) Comment Actions Change #1135783 had a related patch set uploaded (by Reedy; author: C. Scott Ananian): [mediawiki/core@REL1_43] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization gerritbot added a comment. Apr 10 2025, 4:52 PM 2025-04-10 16:52:15 (UTC+0) Comment Actions Change #1135770 merged by jenkins-bot: [mediawiki/core@REL1_39] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization gerritbot added a comment. Apr 10 2025, 5:00 PM 2025-04-10 17:00:17 (UTC+0) Comment Actions Change #1135794 had a related patch set uploaded (by Reedy; author: C. Scott Ananian): [mediawiki/core@master] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization gerritbot added a comment. Apr 10 2025, 5:10 PM 2025-04-10 17:10:03 (UTC+0) Comment Actions Change #1135783 merged by jenkins-bot: [mediawiki/core@REL1_43] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization gerritbot added a comment. Apr 10 2025, 5:24 PM 2025-04-10 17:24:28 (UTC+0) Comment Actions Change #1135794 merged by jenkins-bot: [mediawiki/core@master] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization gerritbot added a comment. Apr 10 2025, 5:50 PM 2025-04-10 17:50:50 (UTC+0) Comment Actions Change #1135775 merged by jenkins-bot: [mediawiki/core@REL1_42] SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization dchan added a comment. May 7 2025, 4:42 PM 2025-05-07 16:42:16 (UTC+0) Comment Actions Looks like now we're good to make this task publicly visible? Any objections remaining? sbassett assigned this task to cscott May 7 2025, 5:27 PM 2025-05-07 17:27:08 (UTC+0) sbassett changed the visibility from " Custom Policy " to "Public (No Login Required)". sbassett changed the edit policy from " Custom Policy " to "All Users". sbassett removed a project: Patch-For-Review Maintenance_bot added a project: MW-Interfaces-Team May 7 2025, 5:30 PM 2025-05-07 17:30:20 (UTC+0) zoe mentioned this in T382756: Error When Editing Pages with Specific Unicode Character in Visual Editor May 7 2025, 6:18 PM 2025-05-07 18:18:24 (UTC+0) andrea.denisse awarded a token. May 16 2025, 7:00 PM 2025-05-16 19:00:51 (UTC+0) Raine awarded a token. May 19 2025, 11:31 AM 2025-05-19 11:31:09 (UTC+0) CDanis awarded a token. May 19 2025, 2:35 PM 2025-05-19 14:35:37 (UTC+0) Daimona awarded a token. May 23 2025, 5:31 PM 2025-05-23 17:31:07 (UTC+0) gerritbot added a comment. Jun 24 2025, 9:32 PM 2025-06-24 21:32:24 (UTC+0) Comment Actions Change #1163476 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian): [utfnormal@master] Replace isolated combining characters gerritbot added a project: Patch-For-Review Jun 24 2025, 9:32 PM 2025-06-24 21:32:25 (UTC+0) gerritbot added a comment. Jul 28 2025, 4:49 PM 2025-07-28 16:49:31 (UTC+0) Comment Actions Change #1163476 merged by jenkins-bot: [utfnormal@master] Replace isolated combining characters Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. Wikimedia Foundation Code of Conduct Disclaimer CC-BY-SA GPL Credits</div> </div> <div class="detail-actions"> <a href="/search?q=phabricator.wikimedia.org" class="btn">Same domain →</a> <a href="/search?q=%E2%9A%93%20T387130%20CVE-2025-32699%3A%20Pote" class="btn btn-secondary">Similar titles →</a> </div> </article> </main> <footer class="site-footer"> <div class="container"> C U Cyber History — Public Interest Web Archive Preserving fading web memories. Discover history that once existed. </div> </footer> <script id="chat-i18n-en" type="application/json">{"button_label":"Need Help?","placeholder":"Ask us anything...","title":"CUCH Assistant","subtitle":"How can we help you?","send":"Send","close":"Close","folder":"/var/www/cu","greeting":"Hi! Welcome to CUCH.org. How can I help you today? Feel free to ask about our archive, search, or anything else!","error":"Sorry, our service is temporarily unavailable. Please try again later.","banner_text":"Need help? Ask our AI assistant!"}</script> <script src="/static/js/chat-widget.js"></script> </body> </html>