⚓ T321006 Special:Search: "429 Too Many Requests" when number of returned search results set to more than 50
Page Menu
Phabricator
Create Task
Maniphest
T321006
Special:Search: "429 Too Many Requests" when number of returned search results set to more than 50
Closed, Resolved
Public
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
matthiasmullie
Authored By
Etonkovidova
Oct 17 2022, 8:41 PM
2022-10-17 20:41:56 (UTC+0)
Tags
SDAW-Search-Improvements (Milestone 1: CSS changes to all wikis)
Structured-Data-Backlog (Current Work)
(Verify on Production)
MW-1.40-notes (1.40.0-wmf.8; 2022-10-31)
Discovery-Search (Current work)
(Needs Reporting)
Referenced Files
F35622618: Screen Shot 2022-10-24 at 1.51.40 PM.png
Oct 24 2022, 9:01 PM
2022-10-24 21:01:36 (UTC+0)
F35592609: Screen Shot 2022-10-17 at 1.28.46 PM.png
Oct 17 2022, 8:41 PM
2022-10-17 20:41:56 (UTC+0)
Subscribers
Aklapper
CBogen
Cparle
EBernhardson
Etonkovidova
Krinkle
matthiasmullie
Sannita
Description
Three cases of thumbnails not being displayed were reported in the following comment - all three reported cases are on
commons wiki
In
T320406#8314174
@Sannita
wrote:
I'm reporting a comment from User:Pelikana (
original diff on Commons
):
To see if special search works well enough, I will search for "SVG" in some combinations like svg+paris and notice that a page with a file list of 500 results may take 30 seconds to load.
list 500 svg results
Svg+Paris
. That's too long. A more realistic search would be
500 jpg list
Here the page loads fast enough but it fails to show all thumbs in one go.
(1) Search terms -
SVG*
list 500 svg results
(2) Search terms -
SVG+Paris
Svg+Paris
(3) Search terms -
Breitner+"RP-T"
500 jpg list
Steps to reproduce:
follow the links above and have the number of search results on a page to be set to 500 (or, if all results get displayed properly, click on the
Next 500
).
while a page with search results is loading, there are no indication to a user that the request was submitted
when a result page loads, some thumbnails image won't be displayed
e.g. for the above
example (1)
the Console will show
GET
scheme https
host upload.wikimedia.org
filename /wikipedia/commons/thumb/7/71/Handbook_of_ornament%3B_a_grammar_of_art%2C_industrial_and_architectural_designing_in_all_its_branches%2C_for_practical_as_well_as_theoretical_use_%28IA_handbookoforname1900meye%29.pdf/page1-120px-thumbnail.pdf.jpg
Status 429 Too Many Requests
Version HTTP/2
Transferred 2.88 KB (1.78 KB size)
Referrer Policyorigin-when-cross-origin
Request Priority Low
NOTE:
The issue is not specific to
commons wiki
. The steps above (with the same search terms) on
enwiki
, for example, would result in the same
429 Too Many Requests
error.
Details
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Don't calculate size of thumb until requested
mediawiki/core
master
+109
-7
Customize query in gerrit
Related Objects
Mentions
Mentioned In
T323126: Don't expose thumbnail size from Rest/Handler/SearchHandler.php
T323125: Don't calculate size of a thumbnail
T320406: Thumbnails on SpecialSearch may fail to load
Mentioned Here
T323125: Don't calculate size of a thumbnail
T323126: Don't expose thumbnail size from Rest/Handler/SearchHandler.php
P36132 Test thumbnail timing
T320406: Thumbnails on SpecialSearch may fail to load
Event Timeline
Etonkovidova
created this task.
Oct 17 2022, 8:41 PM
2022-10-17 20:41:56 (UTC+0)
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
Oct 17 2022, 8:41 PM
2022-10-17 20:41:57 (UTC+0)
CBogen
added a project:
SDAW-Search-Improvements (Milestone 1: CSS changes to all wikis)
Oct 17 2022, 8:43 PM
2022-10-17 20:43:42 (UTC+0)
Etonkovidova
mentioned this in
T320406: Thumbnails on SpecialSearch may fail to load
Oct 17 2022, 9:00 PM
2022-10-17 21:00:53 (UTC+0)
MPhamWMF
added a project:
Discovery-Search
Oct 24 2022, 5:10 PM
2022-10-24 17:10:41 (UTC+0)
Etonkovidova
added a comment.
Oct 24 2022, 9:01 PM
2022-10-24 21:01:36 (UTC+0)
Comment Actions
I re-run the steps in the task description for some of the search queries reported in the task.
The tests were on
commons wmf.6
Special:Search
(no throttling) for search terms
SVG*
number of search results to be returned (selected on Special:Search page
Time for search results (page) to load
20 (default)
5.48s
100
14.9s
250
22.6s
When the number of results is 500, then the search times out
I got the timeout error twice - when I set the number of returned search results in user Preferences and when I set it on Special:Search page.
[202bf565-85cf-48cd-92d7-0581310d4ad2] 2022-10-24 20:32:37: Fatal exception of type "Wikimedia\RequestTimeout\RequestTimeoutException"
[9406839b-812c-4fad-af05-ab07f9d4d8b0] 2022-10-24 20:49:32: Fatal exception of type "Wikimedia\RequestTimeout\RequestTimeoutException"
EBernhardson
subscribed.
Edited
Oct 24 2022, 10:09 PM
2022-10-24 22:09:32 (UTC+0)
Comment Actions
For the search side of things we can attach
&cirrusDumpResult
to any Special:Search request, this will short-circuit the request once elasticsearch responds and tell us how long the search engine took. In my quick testing:
SVG* 500 results
37s
SVG* 500 results search engine only
436ms
It does seem like something after the search engine is taking significant time. Editors in particular have use cases for long result lists. I might suggest having a cutoff for thumbnailing, only show thumbnails for the first 20 or so results?
I put together a sample bit of code at
P36132
to test the timing of different bits of the query process. This measures the parts independantly, no overlap between the parts. The search portion is longer than the earlier example with &cirrusDumpQuery as it includes various post-processing cirrus does that was skipped before.
500 results, SVG*
1.17s
thumbs
32s
less thumbs (20)
0.95s
500 results, SVG Paris
1.33s
thumbs
28s
less thumbs (20)
1.24s
EBernhardson
added a comment.
Oct 24 2022, 11:01 PM
2022-10-24 23:01:31 (UTC+0)
Comment Actions
Digging a bit more into this...
If i replace the line in SearchResultThumbnailProvider::buildSearchResultThumbnailFromFile
$localPath = $thumb->getLocalCopyPath();
with
$localPath = false
Then the thumbnailing runs in ~300ms instead of 30s. Is that part necessary, i suspect it's not worth all the time involved?
gerritbot
added a comment.
Oct 25 2022, 8:48 AM
2022-10-25 08:48:14 (UTC+0)
Comment Actions
Change 849022 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):
[mediawiki/core@master] Don't calculate size of thumb until requested
gerritbot
added a project:
Patch-For-Review
Oct 25 2022, 8:48 AM
2022-10-25 08:48:15 (UTC+0)
CBogen
edited projects, added
Structured-Data-Backlog (Current Work)
; removed
Structured-Data-Backlog
Oct 25 2022, 2:35 PM
2022-10-25 14:35:56 (UTC+0)
CBogen
moved this task from
Incoming
to
Code Review
on the
Structured-Data-Backlog (Current Work)
board.
gerritbot
added a comment.
Oct 25 2022, 3:44 PM
2022-10-25 15:44:17 (UTC+0)
Comment Actions
Change 849022
merged
by jenkins-bot:
[mediawiki/core@master] Don't calculate size of thumb until requested
ReleaseTaggerBot
added a project:
MW-1.40-notes (1.40.0-wmf.8; 2022-10-31)
Oct 25 2022, 4:00 PM
2022-10-25 16:00:41 (UTC+0)
Maintenance_bot
removed a project:
Patch-For-Review
Oct 25 2022, 4:30 PM
2022-10-25 16:30:07 (UTC+0)
Krinkle
subscribed.
Oct 25 2022, 5:53 PM
2022-10-25 17:53:08 (UTC+0)
Comment Actions
In
T321006#8342297
@gerritbot
wrote:
Change 849022
merged
by jenkins-bot:
[mediawiki/core@master] Don't calculate size of thumb until requested
It seems the only caller to SearchResultThumbnail::getSize is the Rest/SearchHandler exposing it as
thumbnail.size
. If I understand correctly, the way it works is by server-side downloading every thumbnail out of Swift into a tmp directory on the appserver (or if missing, generating a thumbnail on-demand in PHP, possibly bypassing Thumbor/wgThumbProxyUrl?) and then computing the file size of that thumbnail.
I assume with the above this lazy-getter is not invoked within Special:Seach, but I'm still curious what it is or was for. What use case do REST ALI clients have for the bytesize of the thumbnail? This notion is somewhat incompatible with the way thumbnails work (as evidenced by the cost it took to compute) as we generally perceive thumbnails as emphemeral and referenced by 404-rendered URL only, not as something we can query and measure.
If we wanted to offer this officially we'd have to rethink some of that, and e.g. offer a FileBackend primitive for fetching/estimating (without in-process downloading) the thumbnail size. In the case of Swift this could be implemented as an HTTP call that asks the Swift cluster for the size of that file, similar to what we do already when we fetch metadata for the original file size. Ideally, this would then also be batched as our (currently imagined) latency budgets and strategy don't really account for multiple Swift downloads in the same web request. Similar to Parsoid invocations, we generally expect one of those in a given web request in terms of resource cost and latency. We'd want this lookup to be batched (which Swift supports) and perhaps also made optional so that the majority of API clients that don't need it don't incur the cost.
As example, when we expect Parser-derived metadata to be needed in bulk, we don't use the ParserCache, not even in a bulk lookup, as ParserCache may miss and is fairly costly to fetch. Instead, that data would be stored in page_props or other derived tables for high volume access.
I recognise this is all still new and plans may already be in place. I'm only asking as a curiosity after the fact, to maintain a mental model of what we have and why it exists. An answer like "it's experimental, don't worry yet" is totally fine :)
matthiasmullie
added a comment.
Oct 26 2022, 7:00 AM
2022-10-26 07:00:02 (UTC+0)
Comment Actions
This particular part is not "new" or "experimental" - far from it.
Some brief background:
class SearchResultThumbnail
(which includes the getSize method) has existed since 2020, and was introduced along with the thumbnails hook (which was then in the Rest namespace) in
At the same time, PageImages started subscribing to that hook and immediately doing the filesize check; in
Since 2020, thumbnails (via the hook + PageImages) have been available via the Rest API. We now wanted to start showing these same thumbnails on Special:Search, so a bunch of that (up until that point) Rest-specific code got moved around recently (& also started affecting Special:Search)
I'm not sure why that filesize check exists - I've only briefly looked at the commits & related phab tickets and couldn't immediately find anything about it.
Perhaps it's unused and we could remove it (it's been around and part of the public API since 2020, though, so it's going to be a little painful), but one would have to dig into that in a bit more detail than I have to figure out whether there are (known) clients relying on this data.
matthiasmullie
moved this task from
Code Review
to
Needs QA
on the
Structured-Data-Backlog (Current Work)
board.
Oct 27 2022, 12:45 PM
2022-10-27 12:45:55 (UTC+0)
matthiasmullie
claimed this task.
Oct 28 2022, 10:30 AM
2022-10-28 10:30:52 (UTC+0)
MPhamWMF
moved this task from
needs triage
to
watching / waiting
on the
Discovery-Search
board.
Oct 31 2022, 2:55 PM
2022-10-31 14:55:10 (UTC+0)
Etonkovidova
moved this task from
Needs QA
to
Verify on Production
on the
Structured-Data-Backlog (Current Work)
board.
Edited
Nov 3 2022, 5:37 PM
2022-11-03 17:37:56 (UTC+0)
Comment Actions
Checked on
wmf.8
deployment - the issue is fixed.
Etonkovidova
closed this task as
Resolved
Nov 3 2022, 9:22 PM
2022-11-03 21:22:04 (UTC+0)
Krinkle
added a comment.
Edited
Nov 7 2022, 6:30 PM
2022-11-07 18:30:24 (UTC+0)
Comment Actions
In
T321006#8344487
@matthiasmullie
wrote:
This particular part is not "new" or "experimental" - far from it. […]
Perhaps it's unused and we could remove it (it's been around and part of the public API since 2020, though, so it's going to be a little painful), but one would have to dig into that in a bit more detail than I have to figure out whether there are (known) clients relying on this data.
2020 is relatively new, but point taken. I thought of it as part of the Vector 22 skin which is opt-in on most wikis. I don't know how much we promote this API outside WMF yet, I assumed it was effectivley internal at this point. But yeah it is in a stable
rest.php/v1
endpoint. Thanks for clarifying, I had forgotten it was v1.
Swift thumbnail size information is imho fundamentally information we cannot serve in a batch format like search. And more generally should not expose anywhere even for a single file as it isn't information that logically exists imho.
Thumbnails are architected as an ephemeral concept, and for scaling and security reasons we allow its rendering and serving to be decoupled from MediaWiki at scale such that e.g. Thumbor and Swift can serve, and (re)generate these as needed based on demand and 404 status. Akin to the Parser, this is information that can be accessed in a web request if it is the subject of that page, but an API endpoint that would require accessing N thumbnails or N wiki pages to parse, goes contrary to the current architecture and is thus likely an attack vector in the worst case, or in the best case something with latency behaviour we can't commit to very well. For the Parser, this is why we store derived information in page_props and link tables. We don't iterate and query ParserCache entries, since those may be missing.
The size of original files is primary data and indexed in the database. The size of thumbnails is not deterministic and can vary over time, by client, and by configuration. For example, do we measure JPG or WebP? With or without gzip? If the thumbnail was recently purged, do we now render it on-demand in the API call? If so, how? We no longer provision the full imagescaler packages on appservers, and for security reasons aren't meant to generate images and write to Swift during high-traffic API calls.
Going back to it being v1-stable, we'd probably have an easier time pulling off a quick deprecation now than 2-3 years in the future.
I'll leave it at this and consider your team sufficiently informed. Happy to help further :)
matthiasmullie
mentioned this in
T323125: Don't calculate size of a thumbnail
Nov 15 2022, 2:51 PM
2022-11-15 14:51:49 (UTC+0)
matthiasmullie
mentioned this in
T323126: Don't expose thumbnail size from Rest/Handler/SearchHandler.php
Nov 15 2022, 2:55 PM
2022-11-15 14:55:14 (UTC+0)
matthiasmullie
added a comment.
Nov 15 2022, 3:00 PM
2022-11-15 15:00:57 (UTC+0)
Comment Actions
I had a quick look and AFAICT, no consumer of this API even uses
size
It's also already documented to be nullable (and for most thumbnails, this ends up being the value)
I think we can safely stop calculating size and always return null since, afaict, there is no consumer, and in the unlikely case that there is one, it should already expect null.
I created
T323125
(to remove the computation & always return null - should be quick and painless) and
T323126
(complete removal)
I'll get started on the former since that one's quick and has the biggest impact.
I may come back to the latter at some point, but this is not actually in my/our wheelhouse and we have other fish to fry ATM - but at least there is a ticket for follow up.
Gehel
edited projects, added
Discovery-Search (Current work)
; removed
Discovery-Search
Feb 12 2025, 8:33 AM
2025-02-12 08:33:24 (UTC+0)
Gehel
moved this task from
Incoming
to
Needs Reporting
on the
Discovery-Search (Current work)
board.
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits
US