⚓ T284693 Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
Page Menu
Phabricator
Create Task
Maniphest
T284693
Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
Closed, Resolved
Public
PRODUCTION ERROR
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
ssastry
Authored By
jeena
Jun 9 2021, 7:55 PM
2021-06-09 19:55:29 (UTC+0)
Tags
Wikimedia-production-error
(June 2021)
Parsoid
(Bugs & Crashers)
Parsoid-Read-Views (Phase 2 - testwiki Main namespace support)
(Code Review)
Content-Transform-Team-WIP
(To Verify)
Referenced Files
None
Subscribers
Aklapper
Arlolra
jeena
Kelson
Krinkle
ssastry
Description
Error
mwversion:
1.37.0-wmf.7
reqId:
4ea208db-028c-458e-b55b-f26c904eb06d
Find reqId in Logstash
Find normalized_message in Logstash
normalized_message
[{reqId}] {exception_url} Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (3 byte sequence)
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/assert/src/Assert.php(224)
#0 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Utils/PHPUtils.php(218): Wikimedia\Assert\Assert::invariant(boolean, string)
#1 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/PP/Processors/WrapTemplates.php(963): Wikimedia\Parsoid\Utils\PHPUtils::safeSubstr(string, integer, integer)
#2 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/PP/Processors/WrapTemplates.php(1226): Wikimedia\Parsoid\Wt2Html\PP\Processors\WrapTemplates::encapsulateTemplates(DOMDocument, Wikimedia\Parsoid\Wt2Html\PageConfigFrame, array, array)
#3 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/PP/Processors/WrapTemplates.php(1239): Wikimedia\Parsoid\Wt2Html\PP\Processors\WrapTemplates::wrapTemplatesInTree(DOMDocument, Wikimedia\Parsoid\Wt2Html\PageConfigFrame, DOMElement)
#4 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(158): Wikimedia\Parsoid\Wt2Html\PP\Processors\WrapTemplates->run(Wikimedia\Parsoid\Config\Env, DOMElement, array, boolean)
#5 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(853): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->Wikimedia\Parsoid\Wt2Html\{closure}(DOMElement, array, boolean)
#6 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(903): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->doPostProcess(DOMElement)
#7 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(920): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->process(DOMElement)
#8 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipeline.php(178): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->processChunkily(string, array)
#9 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipelineFactory.php(307): Wikimedia\Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#10 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Core/WikitextContentModelHandler.php(106): Wikimedia\Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#11 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Parsoid.php(162): Wikimedia\Parsoid\Core\WikitextContentModelHandler->toDOM(Wikimedia\Parsoid\Config\Env)
#12 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/src/Parsoid.php(194): Wikimedia\Parsoid\Parsoid->parseWikitext(MWParsoid\Config\PageConfig, array)
#13 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/extension/src/Rest/Handler/ParsoidHandler.php(589): Wikimedia\Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, NULL)
#14 /srv/mediawiki/php-1.37.0-wmf.7/vendor/wikimedia/parsoid/extension/src/Rest/Handler/PageHandler.php(88): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(MWParsoid\Config\PageConfig, array)
#15 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/Router.php(395): MWParsoid\Rest\Handler\PageHandler->execute()
#16 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/Router.php(322): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#17 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/EntryPoint.php(165): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#18 /srv/mediawiki/php-1.37.0-wmf.7/includes/Rest/EntryPoint.php(130): MediaWiki\Rest\EntryPoint->execute()
#19 /srv/mediawiki/php-1.37.0-wmf.7/rest.php(31): MediaWiki\Rest\EntryPoint::main()
#20 /srv/mediawiki/w/rest.php(3): require(string)
#21 {main}
Impact
~50 times in the last 30 min
Notes
perhaps related to
T236866
Details
Request URL
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Bump parsoid to 0.17.0-a5
mediawiki/vendor
master
+285
-82
Ensure DSR computation is accurate if an unclosed comment is present
mediawiki/services/parsoid
master
+73
-8
Customize query in gerrit
Related Objects
Mentions
Duplicates
Mentioned Here
T236866: InvariantException: Invariant failed: Bad UTF-8 at start of string
Duplicates Merged Here
T284738: Parsoid breaks at rendering br.wikisource.org page
Event Timeline
jeena
created this task.
Jun 9 2021, 7:55 PM
2021-06-09 19:55:29 (UTC+0)
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
Jun 9 2021, 7:55 PM
2021-06-09 19:55:30 (UTC+0)
ssastry
subscribed.
Jun 9 2021, 8:00 PM
2021-06-09 20:00:25 (UTC+0)
Comment Actions
For the benefit of train engineers, I wanted to note that these UTF-8 errors seen in Parsoid are expected to come up in spurts (lots of errors on the same page OR repeated errors because of retries). So, unless there is a significant spike, this shouldn't be a concern for deployments.
Arlolra
merged a task:
T284738: Parsoid breaks at rendering br.wikisource.org page
Jun 10 2021, 3:36 PM
2021-06-10 15:36:48 (UTC+0)
Arlolra
added subscribers:
Kelson
Arlolra
Arlolra
triaged this task as
Medium
priority.
Jun 10 2021, 6:51 PM
2021-06-10 18:51:41 (UTC+0)
Arlolra
moved this task from
Needs Triage
to
Bugs & Crashers
on the
Parsoid
board.
Krinkle
subscribed.
Jun 11 2021, 5:08 PM
2021-06-11 17:08:26 (UTC+0)
Comment Actions
@ssastry
I don't know much about these errors, but it sounds like they have to do with bad user input. Are these generally solved by improving the parser to support more input, or also by categorically letting these fail in a different way? It seems like a categorical change may be appropiate here such that these kinds of issues more generally won't result in an HTTP 5xx and exception log entry, but instead e.g. to a Parsoid-specific log channel with an INFO or WARNING severity for your team to monitor. Possibly also a 4xx status response, but that's a different question.
Deployers and people outside the team have not, and imho should not, try to memorise internals of extensions and what kinds of fatals are "real" fatals. These should be fixed on the producer side instead. Usually that means fixing the root cause if it's easy, but if that's non-trivial or if it's a sustained category of erorrs like this one, then maybe a different mitigation could take place first to ensure they get recategorised. That might change nothing for end-users, but would mean Parsoid plays along better with how other MediaWiki components express themselves in Logstash.
ssastry
added a comment.
Jun 11 2021, 6:14 PM
2021-06-11 18:14:30 (UTC+0)
Comment Actions
These utf-8 errors are caused by a combination of things: either bad encoding stored in the db, "bad markup", or real bugs in Parsoid. These errors helped us find and fix most of these. The remaining ones are still some lingering ones we haven't investigated because there are far fewer of them now. But, yes I agree that it is probably time to migrate these remaining errors to a different log channel. We'll chat about this in the coming week.
Krinkle
moved this task from
Untriaged
to
June 2021
on the
Wikimedia-production-error
board.
Jun 12 2021, 12:16 AM
2021-06-12 00:16:49 (UTC+0)
Kelson
added a comment.
Aug 19 2021, 12:57 PM
2021-08-19 12:57:13 (UTC+0)
Comment Actions
@Krinkle
Thx
ssastry
added a project:
Parsoid-Read-Views
Feb 8 2022, 10:14 PM
2022-02-08 22:14:02 (UTC+0)
JMcLeod_WMF
moved this task from
Uncategorized
to
Phase 2 - testwiki Main namespace support
on the
Parsoid-Read-Views
board.
Jul 28 2022, 2:18 PM
2022-07-28 14:18:52 (UTC+0)
JMcLeod_WMF
edited projects, added
Parsoid-Read-Views (Phase 2 - testwiki Main namespace support)
; removed
Parsoid-Read-Views
ssastry
claimed this task.
Oct 12 2022, 12:50 AM
2022-10-12 00:50:01 (UTC+0)
gerritbot
added a comment.
Oct 12 2022, 12:57 AM
2022-10-12 00:57:24 (UTC+0)
Comment Actions
Change 841598 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):
[mediawiki/services/parsoid@master] Ensure DSR computation is accurate if an unclosed comment is present
gerritbot
added a project:
Patch-For-Review
Oct 12 2022, 12:57 AM
2022-10-12 00:57:25 (UTC+0)
ssastry
moved this task from
Backlog
to
In Progress
on the
Parsoid-Read-Views (Phase 2 - testwiki Main namespace support)
board.
Oct 12 2022, 2:12 AM
2022-10-12 02:12:15 (UTC+0)
ssastry
moved this task from
In Progress
to
Code Review
on the
Parsoid-Read-Views (Phase 2 - testwiki Main namespace support)
board.
gerritbot
added a comment.
Oct 24 2022, 11:47 PM
2022-10-24 23:47:19 (UTC+0)
Comment Actions
Change 841598
merged
by jenkins-bot:
[mediawiki/services/parsoid@master] Ensure DSR computation is accurate if an unclosed comment is present
Maintenance_bot
removed a project:
Patch-For-Review
Oct 25 2022, 12:30 AM
2022-10-25 00:30:23 (UTC+0)
Kelson
added a comment.
Oct 29 2022, 1:06 PM
2022-10-29 13:06:18 (UTC+0)
Comment Actions
See also
ssastry
added a comment.
Oct 30 2022, 3:46 AM
2022-10-30 03:46:48 (UTC+0)
Comment Actions
might take care of that one.
gerritbot
added a comment.
Oct 31 2022, 10:16 PM
2022-10-31 22:16:29 (UTC+0)
Comment Actions
Change 851140 had a related patch set uploaded (by Arlolra; author: Arlolra):
[mediawiki/vendor@master] Bump parsoid to 0.17.0-a5
gerritbot
added a project:
Patch-For-Review
Oct 31 2022, 10:16 PM
2022-10-31 22:16:30 (UTC+0)
gerritbot
added a comment.
Oct 31 2022, 10:35 PM
2022-10-31 22:35:50 (UTC+0)
Comment Actions
Change 851140
merged
by jenkins-bot:
[mediawiki/vendor@master] Bump parsoid to 0.17.0-a5
Maintenance_bot
removed a project:
Patch-For-Review
Oct 31 2022, 11:30 PM
2022-10-31 23:30:37 (UTC+0)
ssastry
added a subscriber:
Content-Transform-Team-WIP
Nov 7 2022, 4:24 PM
2022-11-07 16:24:15 (UTC+0)
Aklapper
added a project:
Content-Transform-Team-WIP
Nov 7 2022, 4:38 PM
2022-11-07 16:38:16 (UTC+0)
Aklapper
removed a subscriber:
Content-Transform-Team-WIP
ssastry
moved this task from
Backlog
to
To Verify
on the
Content-Transform-Team-WIP
board.
Nov 7 2022, 4:44 PM
2022-11-07 16:44:53 (UTC+0)
ssastry
closed this task as
Resolved
Nov 10 2022, 10:43 PM
2022-11-10 22:43:20 (UTC+0)
Comment Actions
I am going to resolve this instance of the phab task. There are now instances that purport to be on main pages of ruwiki, bewiki, ukwiki .. which I'll look at separately but they may be parses for posted wikitext vs page wikitext. It is a bit hard to track those without logging the posted wikitext.
Kelson
added a comment.
Dec 1 2022, 11:19 AM
2022-12-01 11:19:53 (UTC+0)
Comment Actions
On my end, this looks better now. Thx!
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits