Extension:ParserFunctions/String functio

Extension:ParserFunctions/String functions - MediaWiki
Jump to content
From mediawiki.org
Extension:ParserFunctions
(Redirected from
Extension:StringFunctions
Warning:
In 2013, it was decided that
these functions will
never
be enabled on any Wikimedia wiki
, because they are inefficient when used on a large scale (see
phab:T8455
for some history).
These functions do NOT work on Wikimedia wikis!
If you are here to write something on a Wikimedia project, you are looking for something else: if your home wiki has string functions, it probably uses
Lua
. For example, the English Wikipedia uses
Module:String
, which does some of the same things
with wildly different syntax
. There are also individual
String-handling templates
Please ignore this warning if you are editing a self-hosted or
third-party wiki
that's not operated by
WMF
The
ParserFunctions extension
optionally defines various
string functions
if
$wgPFEnableStringFunctions
true
is enabled.
These functions consist of
len
pos
rpos
sub
count
replace
explode
, and
urldecode
All of these functions operate in O(n) time complexity, making them safe against
DoS
attacks.
Some parameters of these functions are limited through global settings to prevent abuse. See section
Limits
hereafter.
For functions that are case sensitive, you may use the
magic word
{{lc:
your_string_here
}}
as a workaround in some cases.
To determine whether a MediaWiki server enables these functions, check the list of supported Extended parser functions in
Special:Version
String length is limited by
$wgPFStringLengthLimit
variable, default to
1000
#len
edit
The #len function returns the length of the given string. The syntax is:
{{#len:string}}
The return value is always a number of characters in the source
string
(after expansions of template invocations, but before conversion to HTML). If no string is specified, the return value is zero.
This function is safe with UTF-8 multibyte characters. Example:
{{#len:Žmržlina}}
Leading and trailing spaces or newlines are not counted, but intermediate spaces and newlines are taken into account. Examples:
{{#len:Icecream }}
{{#len: a b }}
- 3 spaces between 2 characters
Characters given by reference are not converted, but counted according to their source form.
{{#len: }}
- named characters references
{{#len: }}
- numeric characters references, not ignored despite it designates a space here.
Tags such as
‎<
nowiki
and other tag extensions will always have a length of zero, since their content is hidden from the parser. Example:
{{#len:This is a test}}
#pos
edit
The #pos function returns the position of a given search term within the string. The syntax is:
{{#pos:string|search term|offset}}
The
offset
parameter, if specified, tells a starting position where this function should begin searching.
If the
search term
is found, the return value is a zero-based integer of the first position within the
string
If the
search term
is not found, the function returns an empty string.
This function is case sensitive.
The maximum allowed length of the
search term
is limited through the
$wgStringFunctionsLimitSearch
global setting.
This function is safe with UTF-8 multibyte characters. Example:
{{#pos:Žmržlina|žlina}}
returns 3.
As with
#len
‎<
nowiki
and other tag extensions are treated as having a length of zero for the purposes of character position. Example:
{{#pos:This is a test|test}}
returns 0.
#rpos
edit
The #rpos function returns the last position of a given search term within the string. The syntax is:
{{#rpos:string|search term}}
If the
search term
is found, the return value is a zero-based integer of its last position within the
string
If the
search term
is not found, the function returns -1.
When using this to search for the last delimiter, add +1 to the result to retrieve position after the last delimiter. This also works when the delimiter is not found, because "-1 + 1" is zero, which is the beginning of the given value.
This function is case sensitive.
The maximum allowed length of the
search term
is limited through the
$wgStringFunctionsLimitSearch
global setting.
This function is safe with UTF-8 multibyte characters. Example:
{{#rpos:Žmržlina|lina}}
returns 4.
As with
#len
‎<
nowiki
and other tag extensions are treated as having a length of zero for the purposes of character position. Example:
{{#rpos:This is a test|test}}
returns 0.
#sub
edit
The #sub function returns a substring from the given string. The syntax is:
{{#sub:string|start|length}}
The
start
parameter, if positive (or zero), specifies a zero-based index of the first character to be returned.
Example:
{{#sub:Icecream|3}}
returns
cream
{{#sub:Icecream|0|3}}
returns
Ice
If the
start
parameter is negative, it specifies how many characters from the end should be returned.
Example:
{{#sub:Icecream|-3}}
returns
eam
The
length
parameter, if present and positive, specifies the maximum length of the returned string.
Example:
{{#sub:Icecream|3|3}}
returns
cre
If the
length
parameter is negative, it specifies how many characters will be omitted from the end of the string.
Example:
{{#sub:Icecream|3|-3}}
returns
cr
If the
start
parameter is negative, it specifies how many characters from the end should be returned. The
length
parameter, if present and positive, specifies the maximum length of the returned string from the starting point.
Example:
{{#sub:Icecream|-3|2}}
returns
ea
If the
length
parameter is zero, it is not used for truncation at all.
Example:
{{#sub:Icecream|3|0}}
returns
cream
{{#sub:Icecream|0|3}}
returns
Ice
If
start
denotes a position beyond the truncation from the end by negative
length
parameter, an empty string will be returned.
Example:
{{#sub:Icecream|3|-6}}
returns an empty string.
This function is safe with UTF-8 multibyte characters. Example:
{{#sub:Žmržlina|3}}
returns
žlina
As with
#len
‎<
nowiki
and other tag extensions are treated as having a length of zero for the purposes of character position. Example:
{{#sub:This is a test|1}}
returns
est
#count
edit
The #count function returns the number of times a given substring appears within the provided text.
{{#count:string|substring}}
#replace
edit
The #replace function returns the given string with all occurrences of a search term replaced with a replacement term.
{{#replace:string|search term|replacement term}}
If the
search term
is unspecified or empty, a single space will be searched for.
If the
replacement term
is unspecified or empty, all occurrences of the
search term
will be removed from the
string
This function is case-sensitive.
The maximum allowed length of the
search term
is limited through the
$wgStringFunctionsLimitSearch
global setting.
The maximum allowed length of the
replacement term
is limited through the
$wgStringFunctionsLimitReplace
global setting.
Even if the
replacement term
is a space, an empty string is used. This is a side-effect of the MediaWiki parser. To use a space as the
replacement term
, put it in nowiki tags.
Example:
{{#replace:My_little_home_page|_| }}
returns
My little home page
If this doesn't work, try
{{#replace:My_little_home_page|_| }}
with two self-closing tags.
Note that
this is the only acceptable use of nowiki
in the replacement term, as otherwise nowiki could be used to bypass
$wgStringFunctionsLimitReplace
, injecting an arbitrarily large number of characters into the output. For this reason, all occurrences of
‎<
nowiki
or any other tag extension within the replacement term are replaced with spaces.
This function is safe with UTF-8 multibyte characters. Example:
{{#replace:Žmržlina|ž|z}}
returns
Žmrzlina
If multiple items in a single text string need to be replaced, one could also consider
Extension:ReplaceSet
. It adds a parser function for a sequence of replacements.
Case-insensitive replace
Currently the syntax doesn't provide a switch to toggle case-sensitivity setting. But you may make use of
magic words of formatting
as a workaround. (e.g.
{{lc:
your_string_here
}}
) For example, if you want to remove the word "Category:" from the string regardless of its case, you may type:
{{#replace:{{lc:{{{1}}}}}|category:|}}
But the disadvantage is that the output will become all lower-case. If you want to keep the casing after replacement, you have to use multiple nesting levels (i.e. multiple replace calls) to achieve the same thing.
#explode
edit
The #explode function splits the given string into pieces and then returns one of the pieces. The pieces are
0-indexed
. The syntax is:
{{#explode:string|delimiter|position|limit}}
The
delimiter
parameter specifies a string to be used to divide the
string
into pieces. This
delimiter
string is then not part of any piece, and when two
delimiter
strings are next to each other, they create an empty piece between them. If this parameter is not specified, a single space is used. The
limit
parameter is available in ParserFunctions only, not the standalone StringFunctions version, and allows you to limit the number of parts that the value is split into, with all remaining text included in the final part.
The
position
parameter specifies which piece is to be returned. Pieces are counted from 0. If this parameter is not specified, the first piece is used (piece with number 0). When a negative value is used as
position
, the pieces are counted from the end. In this case, piece number -1 means the last piece. Examples:
{{#explode:And if you tolerate this| |2}}
returns
you
{{#explode:String/Functions/Code|/|-1}}
returns
Code
{{#explode:Split%By%Percentage%Signs|%|2}}
returns
Percentage
{{#explode:And if you tolerate this thing and expect no more| |2|3}}
returns
you tolerate this thing and expect no more
The return value is the
position
-th piece. If there are fewer pieces than the
position
specifies, an empty string is returned.
This function is case sensitive.
The maximum allowed length of the
delimiter
is limited through
$wgStringFunctionsLimitSearch
global setting.
This function is safe with UTF-8 multibyte characters. Example:
{{#explode:Žmržlina|ž|1}}
returns
lina
#urldecode
edit
#urldecode
converts the escape characters from an '
URL encoded
' string back to readable text. The syntax is:
{{#urldecode:value}}
This function works by directly exposing PHP's
urldecode()
function.
A character-code-reference can be found
at
www.w3schools.com
The opposite,
urlencode
, has been integrated into MediaWiki as of version 1.18; for examples, see
Help:Magic Words
urldecode was merged from Stringfunctions in 2010, by commit 1b75afd18d3695bdb6ffbfccd0e4aec064785363
Limits
edit
This module defines three global settings:
$wgStringFunctionsLimitSearch
$wgStringFunctionsLimitReplace
$wgStringFunctionsLimitPad
These are used to limit some parameters of some functions to ensure the functions operate in O(n) time complexity, and are therefore safe against
DoS
attacks.
$wgStringFunctionsLimitSearch
edit
This setting is used by
#pos
#rpos
#replace
, and
#explode
. All these functions search for a substring in a larger string while they operate, which can run in O(n*m) and therefore make the software more vulnerable to
DoS
attacks. By setting this value to a specific small number, the time complexity is decreased to O(n).
This setting limits the maximum allowed length of the string being searched for.
The default value is 30 multibyte characters.
$wgStringFunctionsLimitReplace
edit
This setting is used by
#replace
. This function replaces all occurrences of one string for another, which can be used to quickly generate very large amounts of data, and therefore makes the software more vulnerable to
DoS
attacks. This setting limits the maximum allowed length of the replacing string.
The default value is 30 multibyte characters.
See also
edit
Extension:StringFunctionsEscaped
— Functions that also allow you to use escaped characters (such as \n, \t…)
ReplaceSet
— An excellent substitute for using nested
#replace
commands when you need to perform a sequence of replaces on a single text string
Variables
Manual:Performing string operations with parser functions
for a different set of hacks used to perform string functions when these are disabled
Retrieved from "
Extension
ParserFunctions/String functions
Add topic