⚓ T359761 Create a parser function to get the direction of a language or script
Page Menu
Phabricator
Create Task
Maniphest
T359761
Create a parser function to get the direction of a language or script
Closed, Resolved
Public
Feature
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
None
Authored By
Nikki
Mar 10 2024, 7:47 AM
2024-03-10 07:47:11 (UTC+0)
Tags
I18n
(Untriaged)
MediaWiki-Internationalization
(Backlog)
RTL
(Backlog)
MW-1.43-notes (1.43.0-wmf.17; 2024-08-06)
User-notice-archive
(Backlog)
Referenced Files
None
Subscribers
Aklapper
aliu
Bugreporter
cscott
Ebrahim
Izno
Jarekt
View All 15 Subscribers
Description
Feature summary
(what you would like to be able to do and where):
There should be a way for pages (typically templates) to easily get the direction for a language code or script code.
For example, there could be a parser function such as
{{#dir:...}}
{{#dir:en}}
would produce "ltr".
{{#dir:ar}}
would produce "rtl".
{{#dir:Arab}}
would produce "rtl".
{{#dir:und-arab}}
would produce "rtl".
If the input is a language code without a script code, it would return the direction MediaWiki has for that language.
If the input is a language code with a script code, or just a script code, it would return the direction for that script code.
MediaWiki does not currently have data about scripts, but it could get it from CLDR, which provides data about scripts generated from Unicode data (
main repository
JSON repository
).
They currently list 35 scripts as rtl: Adlm Arab Armi Avst Chrs Cprt Elym Hatr Hebr Hung Khar Lydi Mand Mani Mend Merc Mero Narb Nbat Nkoo Orkh Ougr Palm Phli Phlp Phnx Prti Rohg Samr Sarb Sogd Sogo Syrc Thaa Yezi
There are also a few variants of those scripts which don't get included in CLDR's data: Aran, Syre, Syrj, Syrn
Use case(s)
(list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
Wiki pages often want to include words in other languages. Multilingual wikis often have translatable elements on pages and need to make sure the direction is set correctly.
There are many wikis with templates like Template:Dir (
), most of which hardcode an outdated list of language codes.
Lots of wikis have modules with lists of rtl scripts (
global search
) and functions to return the direction (
global search
).
mw.language:getDir()
in Lua is often not suitable because it's not easily accessible from a template without first writing a module, it only supports the languages included in MediaWiki, and it's easy to run into the limit on how many times you can use it on a single page.
Benefits
(why should this be implemented?):
It would reduce the amount of maintenance needed and improve consistency across wikis (wikis would not need lists of rtl scripts, if a new rtl script is added to Unicode, it would only need to be added to one place for the data to become available to all wikis).
Supporting script codes and languages with script codes would improve support for languages not yet included in MediaWiki.
It might also be more efficient to fetch the direction from the script (when provided) by looking up scripts in a relatively short list of script codes.
Note {{Dir}} is currently the most used template in Commons. See also:
T343131: Commons database is growing way too fast
Details
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Add {{#dir}} parser function
mediawiki/core
master
+136
-0
Customize query in gerrit
Related Objects
Search...
Task Graph
Mentions
Duplicates
Status
Subtype
Assigned
Task
Resolved
Feature
None
T359761
Create a parser function to get the direction of a language or script
Open
None
T365302
Add explicit script code support to isRTL in Language class
Mentioned In
T393118: "if empty" parser function
T374799: Get rid of translatewiki:Template:Dir
T343131: Commons database is growing way too fast
T365302: Add explicit script code support to isRTL in Language class
T366623: Create a parser function to get the BCP47 code for a language
T299369: Consider removing global $userLang from onPageContentLanguage hook
Mentioned Here
T365302: Add explicit script code support to isRTL in Language class
T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar
T202794: Many more languages need to be added to Multilingual Wikisource (mul.ws)
rOMWCdd74abb853ba: Raise Scribunto maxLangCacheSize to 200
T343131: Commons database is growing way too fast
Duplicates Merged Here
T365189: Add a parser function for dir
Event Timeline
Nikki
created this task.
Mar 10 2024, 7:47 AM
2024-03-10 07:47:11 (UTC+0)
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
Mar 10 2024, 7:47 AM
2024-03-10 07:47:11 (UTC+0)
Bugreporter
updated the task description.
(Show Details)
Mar 10 2024, 12:10 PM
2024-03-10 12:10:52 (UTC+0)
RP88
subscribed.
Mar 10 2024, 5:56 PM
2024-03-10 17:56:35 (UTC+0)
Tacsipacsi
subscribed.
Mar 11 2024, 6:11 PM
2024-03-11 18:11:04 (UTC+0)
Comment Actions
mw.language:getDir()
in Lua is often not suitable because […] it's easy to run into the limit on how many times you can use it on a single page.
Every language counts only once, so as long as the same language is queried again and again, the Lua solution won’t run into the limit either. On the other hand, the limit is there for a reason, so a parser function solution will probably also have a limit. This is not to say that there should be no parser function, but
this particular
argument isn’t very strong.
(By the way, since
dd74abb853ba56aef99b7c9d09dd02bdcb88129b
the limit on Wikimedia is 200, so it’s not really likely that one accidentally hits the limit, unless one wants to load
all
languages on a page.)
mw.language:getDir()
in Lua is often not suitable because […] it only supports the languages included in MediaWiki […].
What is the use case for getting directionality of languages not included in MediaWiki? Multilingual wikis’ contents are usually available only in languages included in MediaWiki.
mw.language:getDir()
in Lua is often not suitable because it's not easily accessible from a template without first writing a module […].
This is true; using a module would only
worsen
T343131
MaryMunyoki
added a project:
RTL
Mar 13 2024, 2:15 PM
2024-03-13 14:15:33 (UTC+0)
Tacsipacsi
mentioned this in
T299369: Consider removing global $userLang from onPageContentLanguage hook
Apr 27 2024, 1:12 PM
2024-04-27 13:12:56 (UTC+0)
Jdforrester-WMF
subscribed.
Apr 27 2024, 9:30 PM
2024-04-27 21:30:19 (UTC+0)
Bugreporter
subscribed.
Apr 30 2024, 8:38 AM
2024-04-30 08:38:29 (UTC+0)
Comment Actions
Multilingual wikis’ contents are usually available only in languages included in MediaWiki.
See also:
T202794: Many more languages need to be added to Multilingual Wikisource (mul.ws)
Tacsipacsi
added a comment.
Apr 30 2024, 12:37 PM
2024-04-30 12:37:09 (UTC+0)
Comment Actions
Indeed,
T202794
uses a different definition of “languages included in MediaWiki” than what I was thinking of:
Commons usually uses languages that can be selected in the preferences (have MediaWiki translations), since it displays the appropriate translation based on the language selected in the preferences.
Multilingual Wikisource wants to also use languages that are long extinct and thus don’t make much sense in the preferences (don’t have MediaWiki translations). However, they still need to be included in MediaWiki in one way or the other: for example, to be able to display languages using that language with the right directionality.
I looked up the source code, and Scribunto is actually extremely permissive as to what languages it accepts: for example,
mw.language.new('fklflmwlmfkmf'):isRTL()
happily returns
false
without throwing any error. So if a language is included in MediaWiki by any definition (including the definition used by mulwikisource), Scribunto will handle it and return its directionality. If it’s not included at all, a magic word won’t work either.
jhsoby
subscribed.
Apr 30 2024, 1:29 PM
2024-04-30 13:29:04 (UTC+0)
Bugreporter
merged a task:
T365189: Add a parser function for dir
May 17 2024, 12:52 AM
2024-05-17 00:52:55 (UTC+0)
Bugreporter
added subscribers:
Ebrahim
Izno
Jarekt
Bugreporter
added a comment.
May 17 2024, 1:01 AM
2024-05-17 01:01:26 (UTC+0)
Comment Actions
The implementation can't replace {{dir}} as that needs to be invoked like {{dir|fa}} instead of {{dir:fa}} (it's not possible to have {{dir|fa}} as a parser function) ...
See:
T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar
Novem_Linguae
subscribed.
May 17 2024, 1:04 AM
2024-05-17 01:04:17 (UTC+0)
gerritbot
added a comment.
May 17 2024, 8:31 AM
2024-05-17 08:31:33 (UTC+0)
Comment Actions
Change #1032542 had a related patch set uploaded (by Ebrahim; author: Ebrahim):
[mediawiki/core@master] Add dir parser function
gerritbot
added a project:
Patch-For-Review
May 17 2024, 8:31 AM
2024-05-17 08:31:33 (UTC+0)
Ebrahim
added a comment.
Edited
May 17 2024, 12:53 PM
2024-05-17 12:53:42 (UTC+0)
Comment Actions
In
T359761#9807517
@Bugreporter
wrote:
The implementation can't replace {{dir}} as that needs to be invoked like {{dir|fa}} instead of {{dir:fa}} (it's not possible to have {{dir|fa}} as a parser function) ...
See:
T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar
I didn't know about this, thanks, but even that won't help on having
{{dir|fa}}
as a parser function IIUC. So the decision here is either to either use
{{dir:fa}}
or
{{#dir:en}}
(which as
T204371
can be used as
{{#dir|fa}}
in future) and I think
{{#dir:fa}}
matches better with currently available
{{#language:fa}}
(among other decisions or either if we want this at all).
Jarekt
awarded a token.
May 17 2024, 12:59 PM
2024-05-17 12:59:17 (UTC+0)
Jarekt
added a comment.
May 17 2024, 1:08 PM
2024-05-17 13:08:52 (UTC+0)
Comment Actions
In
T359761#9808473
@Ebrahim
wrote:
In
T359761#9807517
@Bugreporter
wrote:
The implementation can't replace {{dir}} as that needs to be invoked like {{dir|fa}} instead of {{dir:fa}} (it's not possible to have {{dir|fa}} as a parser function) ...
See:
T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar
I didn't know about this, thanks, but either that won't help on replacing on having
{{dir|fa}}
as a parser function IIUC. So the decision here is either to either use
{{dir:fa}}
or
{{#dir:en}}
(which as
T204371
can be used as
{{#dir|fa}}
in future) and I think
{{#dir:fa}}
matches better with currently available
{{#language:fa}}
(among other decisions or either if we want this at all).
{{dir:fa}}
or
{{#dir:en}}
would be both fine. I would go with the
{{#dir:en}}
format so it looks like other parser functions. We can than do bunch of replacements on Commons.
aliu
subscribed.
Edited
May 17 2024, 5:44 PM
2024-05-17 17:44:31 (UTC+0)
Comment Actions
Agreed that replacement shouldn't be that much work. We can check how
:! was migrated.
Ebrahim
added a subscriber:
cscott
Edited
May 17 2024, 7:29 PM
2024-05-17 19:29:20 (UTC+0)
Comment Actions
{{dir:fa}}
or
{{#dir:en}}
would be both fine. I would go with the
{{#dir:en}}
format so it looks like other parser functions. We can than do bunch of replacements on Commons.
Thanks, just applied in
Also renamed it to
{{#direction}}
and made
{{#dir}}
an alias, maybe we can even remove
{{#dir}}
or keep it as the uses in Common? As
@cscott
review removed second and third parameter and we should take care of it with a
{{#ifeq}}
using some template and
subst:
If the input is a language code with a script code, or just a script code, it would return the direction for that script code.
Also applied this but perhaps we should see what
@cscott
will think about it. Perhaps we can go with one version of it, maybe the simplest one, then decide about the details later. Or on the other direction, we can first start with this implementation, have a tracking category for cornercases on Commons and cleanup the input using it and simplify MediaWiki's implementation.
Ebrahim
added a comment.
Edited
May 17 2024, 9:40 PM
2024-05-17 21:40:45 (UTC+0)
Comment Actions
Turned the implementation into what
@cscott
said at the review system,
This doesn't seem correct here. It should be part of Language::isRTL() or fixed in the language definition. The parser function should return exactly the same as Language::isRTL.
I guess if we can have one the bare minimum implementation we can even tweak
getLanguage
for more feature parity with
{{dir}}
later.
Bugreporter
added a comment.
Edited
May 17 2024, 10:11 PM
2024-05-17 22:11:29 (UTC+0)
Comment Actions
What is currently missing:
Language codes with explicit script code, but no separate localization in core or is not explicitly defined as rtl. See
T365302: Add explicit script code support to isRTL in Language class
Using #dir: with only a script code such as {{#dir:Arab}}. If we need to support it too, we may (though not must) introduce a new function (or even class) for script to reduce redundant code.
Languages without current MediaWiki localization support: we may feed them from other sources
script code can be filled from
- Note some languages, like Ottoman Turkish (ota), is not in CLDR either, so
has data for more than 8000 language codes (but still does not contain all rtl languages; see below)
The list of rtl languages derived from likelySubtags are: (note it contains deprecated language codes such as
ji
aao, abh, abv, acm, acq, acw, acx, adf, ae, aeb, aec, aee, aeq, afb, aib, aij, aiq, amw, apc, apd, ar, arc, arq, ars, ary, arz, ask, atn, auj, auz, avd, avl, ayh, ayl, ayn, ayp, azb, bal, bdz, bej, bft, bgn, bgp, bhe, bhm, bhn, bjf, bjm, bqi, brh, brk, bsh, bsk, chg, cja, ckb, clh, czk, dcc, def, deh, dmk, dml, dv, ecy, esh, fa, fay, faz, fia, fub, gbz, ggg, gha, ghr, gig, gjk, gju, glh, glk, grc, gwc, gwf, gwt, gzi, hac, haz, hbo, he, hkh, hnd, hno, hoh, hrt, hrz, hss, huy, isk, itk, iw, jad, jat, jbe, jbn, jdg, ji, jnd, jog, jpa, jpr, jrb, jye, kbu, kby, kfm, khw, klj, kmz, kqd, ks, ktl, kvx, kxp, lad, lah, lhs, lki, lrc, lrk, lrl, lsa, lsd, lss, luv, luz, mby, mde, mfa, mfi, mhj, mid, mki, mnj, mvy, myz, mzn, nli, nlm, nqo, ntz, nyq, oar, obm, odk, oru, ota, otk, pal, pbt, pgd, phl, phn, phr, phv, plk, pra, prc, prd, prx, ps, psh, psi, pst, qxq, rdb, rhg, rmt, sam, sbn, scl, sd, sdb, sdf, sdg, sdh, sds, sgr, sgy, shd, shm, shu, shv, siy, siz, skr, smp, smy, sog, sqo, sqt, srh, srz, ssh, sts, swb, syc, syn, syr, tjo, tks, tmr, tov, tra, trg, trm, trw, ug, ur, ush, uzs, vaf, vgr, vmh, wbk, wlo, wne, wni, wsv, xco, xhe, xka, xkc, xkj, xkp, xld, xly, xmn, xmr, xna, xpr, xsa, xvi, ydg, yhd, yi, yih, yud, zba, zdj, zrp, zum
The following languages is defined in Commons as rtl but not included in likelySubtags at all:
aic, ajp, ara, arb, bbz, bcc, bqp, gda, kcn, kfr, mve, mzb, pbu, pga, pnb, prs, sqr, swh, tly, wbl, xpu, ydd
Jarekt
added a comment.
May 17 2024, 10:34 PM
2024-05-17 22:34:38 (UTC+0)
Comment Actions
On Commons, the main use of Dir template is to return direction of the text in the language used by the user. A pseudo code would be
{{dir | {{{lang | {{int:lang}} }}} }}
. That way any template displaying stuff is using html tags indicating text direction of the language used by the user. Most of the time templates do not use {{{lang}} parameter, but for the testing purposes we can pass it to the template to see the template using other text direction. That means that in great majority of the cases on Commons input to
{{dir}}
is the output of
{{int:lang}}
, and the languages returned by
{{int:lang}}
are the ones we care about. Current template returns
ltr
for {{dir|Arab}} or for any other random string which is not recognized as language.
Nikerabbit
subscribed.
May 20 2024, 10:19 AM
2024-05-20 10:19:02 (UTC+0)
Pcoombe
subscribed.
Jun 2 2024, 10:43 AM
2024-06-02 10:43:13 (UTC+0)
cscott
mentioned this in
T366623: Create a parser function to get the BCP47 code for a language
Jun 4 2024, 4:53 PM
2024-06-04 16:53:58 (UTC+0)
cscott
added a project:
User-notice
Jun 12 2024, 7:15 PM
2024-06-12 19:15:25 (UTC+0)
Quiddity
moved this task from
To Triage
to
Not ready to announce
on the
User-notice
board.
Jun 14 2024, 5:26 PM
2024-06-14 17:26:33 (UTC+0)
gerritbot
added a comment.
Jul 30 2024, 8:34 PM
2024-07-30 20:34:30 (UTC+0)
Comment Actions
Change #1032542
merged
by jenkins-bot:
[mediawiki/core@master] Add {{#dir}} parser function
Ebrahim
closed this task as
Resolved
Jul 30 2024, 8:55 PM
2024-07-30 20:55:56 (UTC+0)
ReleaseTaggerBot
added a project:
MW-1.43-notes (1.43.0-wmf.17; 2024-08-06)
Jul 30 2024, 9:00 PM
2024-07-30 21:00:41 (UTC+0)
Maintenance_bot
removed a project:
Patch-For-Review
Jul 30 2024, 9:31 PM
2024-07-30 21:31:13 (UTC+0)
Tacsipacsi
moved this task from
Not ready to announce
to
Announce in next Tech/News
on the
User-notice
board.
Jul 31 2024, 3:00 PM
2024-07-31 15:00:25 (UTC+0)
Comment Actions
Thanks
@cscott
Pcoombe
awarded a token.
Jul 31 2024, 3:09 PM
2024-07-31 15:09:11 (UTC+0)
Quiddity
moved this task from
Announce in next Tech/News
to
In current Tech/News draft
on the
User-notice
board.
Jul 31 2024, 7:03 PM
2024-07-31 19:03:11 (UTC+0)
Quiddity
moved this task from
In current Tech/News draft
to
Already announced/Archive
on the
User-notice
board.
Aug 7 2024, 6:33 PM
2024-08-07 18:33:47 (UTC+0)
Nikki
mentioned this in
T365302: Add explicit script code support to isRTL in Language class
Aug 8 2024, 8:05 AM
2024-08-08 08:05:00 (UTC+0)
Jarekt
added a comment.
Aug 9 2024, 7:09 PM
2024-08-09 19:09:59 (UTC+0)
Comment Actions
Working with
@Ebrahim
, we got most of the uses of
c:Template:Dir
replaced with #dir parser function, at least in template namespace. Database still shows 123,999,814 transclusions, so it will be interesting to see how long is it going to take for this number to drop.
Jarekt
mentioned this in
T343131: Commons database is growing way too fast
Aug 9 2024, 7:52 PM
2024-08-09 19:52:21 (UTC+0)
aliu
added a comment.
Aug 9 2024, 8:56 PM
2024-08-09 20:56:47 (UTC+0)
Comment Actions
Does commons have a list of templates to be automatically substituted by a bot?
Jarekt
added a comment.
Aug 9 2024, 9:04 PM
2024-08-09 21:04:25 (UTC+0)
Comment Actions
In
T359761#10054918
@aliu
wrote:
Does commons have a list of templates to be automatically substituted by a bot?
I think we substituted all calls to {{dir}} templates for all pages in template namespace, so there should not be anything else to do other than wait for the database to catch up, which might be a while.
Maintenance_bot
edited projects, added
User-notice-archive
; removed
User-notice
Aug 29 2024, 9:35 PM
2024-08-29 21:35:54 (UTC+0)
Bugreporter
mentioned this in
T374799: Get rid of translatewiki:Template:Dir
Sep 16 2024, 2:14 AM
2024-09-16 02:14:05 (UTC+0)
matej_suchanek
mentioned this in
T393118: "if empty" parser function
May 1 2025, 7:21 PM
2025-05-01 19:21:53 (UTC+0)
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits