User:松/Drafts/Extension:AbuseFilter/Rules format: Difference between revisions

Reverted 2 edits by 77.237.71.119 (talk). (TW)
No edit summary
(Reverted 2 edits by 77.237.71.119 (talk). (TW))
Line 1:
{{DISPLAYTITLE:*}}
The rules are formatted much as conditionals in a C/Java/Perl-like language.
 
=== Literals ===
You can specify a literal by placing it in single or double quotes (for strings), or by typing it in as-is (for numbers, both floating-point and integer). You can get linebreaks with <code>\n</code>, tab characters with <code>\t</code>, and you can also escape the quote character with a backslash.
 
'''Examples'''
<syntaxhighlight lang="c">
"This is a string"
'This is also a string'
'This string shouldn\'t fail'
"This string\nHas a linebreak"
1234
1.234
-123
</syntaxhighlight>
 
=== Comments ===
Line 15 ⟶ 28:
'''Action''' variable can be <code>'edit'</code>, <code>'move'</code>, <code>'createaccount'</code>, <code>'autocreateaccount'</code>, <code>'delete'</code> or <code>'upload'</code>.
 
You can define more variables for ease of understanding with the assign symbol <code>:=</code> in a line (closed by <code>;</code>) within a condition. Example (from [[w:en:Special:AbuseFilter/79]]):
<syntaxhighlight lang="c">
(
line1:="(\{\{(r|R)eflist|\{\{(r|R)efs|<references\s?/>|</references\s?>)";
rcount(line1, removed_lines)
) > (
rcount(line1, added_lines)
)
</syntaxhighlight>
 
Lists:
<syntaxhighlight lang="c">
a_list := [ 5, 6, 7];
</syntaxhighlight>
 
=== <ref name="bug27987">Be aware of bug 27987</ref> ===
 
====All variables====
{| class="wikitable sortable"
|+ Variables available
! Description !! Name !! Data type !! Values
|-
| {{int:abusefilter-edit-builder-vars-user-editcount}} || <code>user_editcount</code> || string || Empty for unregistered users.
|-
| {{int:abusefilter-edit-builder-vars-user-name}} || <code>user_name</code> || string ||
|-
| {{int:abusefilter-edit-builder-vars-user-emailconfirm}} || <code>user_emailconfirm</code> || string || YYYYMMDDHHMMSS
|-
| {{int:abusefilter-edit-builder-vars-user-age}} || <code>user_age</code> || || in seconds; 0 for IP
|-
| {{int:abusefilter-edit-builder-vars-user-groups}} || <code>user_groups</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-user-rights}} || <code>user_rights</code> || ||
|-
| [[Manual:Page_table#page_id|{{int:abusefilter-edit-builder-vars-article-nsid}}]] (found in the page's HTML source - search for wgArticleId) || <code>article_namespacearticle_articleid</code> || integer || In theory this is 0 for new pages, but this is unreliable. Instead, use "old_size==0" to identify new page creation.
|-
| {{int:abusefilter-edit-builder-vars-article-textns}} || <code>article_textarticle_namespace</code> || stringinteger || refers to [[Manual:Namespace#Built-in_namespaces|namespace index]]
|-
| {{int:abusefilter-edit-builder-vars-article-prefixedtexttext}} || <code>article_prefixedtextarticle_text</code> || string ||
|-
| {{int:abusefilter-edit-builder-vars-restrictionsarticle-editprefixedtext}} || <code>article_restrictions_editarticle_prefixedtext</code> || string ||
|-
| {{int:abusefilter-edit-builder-vars-restrictions-moveedit}} || <code>article_restrictions_movearticle_restrictions_edit</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-recentrestrictions-contributorsmove}} || <code>article_recent_contributorsarticle_restrictions_move</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-recent-contributors}} || <code>article_recent_contributors</code> || || Empty if the user is the only contributor to the page
| Action || <code>action</code> || string
|-
| {{int:abusefilter-edit-builder-vars-summary}}Action || <code>summaryaction</code> || string || edit, move, createaccount, autocreateaccount, delete, upload
|-
| {{int:abusefilter-edit-builder-vars-minor-editsummary}} || <code>minor_editsummary</code> || string ||
|-
| {{int:abusefilter-edit-builder-vars-oldminor-textedit}} || <code>old_wikitextminor_edit</code> || string ||
|-
| {{int:abusefilter-edit-builder-vars-newold-text}} || <code>new_wikitextold_wikitext</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-diffnew-text}} || <code>edit_diffnew_wikitext</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-newsizediff}} || <code>new_sizeedit_diff</code> || integer ||
|-
| {{int:abusefilter-edit-builder-vars-oldsizenewsize}} || <code>old_sizenew_size</code> || integer ||
|-
| {{int:abusefilter-edit-builder-vars-deltaoldsize}} || <code>edit_deltaold_size</code> || <!--integer test this with === when edit_delta < 0 -->||
|-
| {{int:abusefilter-edit-builder-vars-addedlinesdelta}} || <code>added_linesedit_delta</code> || <!-- test this with === when edit_delta < 0 --> ||
|-
| {{int:abusefilter-edit-builder-vars-removedlinesaddedlines}} || <code>removed_linesadded_lines</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-all-linksremovedlines}} || <code>all_linksremoved_lines</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-oldall-links}} || <code>old_linksall_links</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-addedold-links}} || <code>added_linksold_links</code> || ||
|-
| {{int:abusefilter-edit-builder-vars-removedadded-links}} || <code>removed_linksadded_links</code> || ||
|-
| Parsed HTML source of the new revision{{int:abusefilter-edit-builder-vars-removed-links}} || <code>new_htmlremoved_links</code> || ||
|-
| NewParsed pageHTML text, strippedsource of anythe markupnew revision || <code>new_textnew_html</code> || ||
|-
| ''Disabled''New page text, stripped of any markup || <code>old_htmlnew_text</code> || ||
|-
| ''Disabled'' || <code>old_textold_html</code> || ||
|-
| Whether or not the change was made through a tor exit node''Disabled'' || <code>tor_exit_nodeold_text</code> || ||
|-
| UnixWhether timestampor ofnot the change was made through a tor exit node || <code>timestamptor_exit_node</code> || string || 0, 1
|-
| Unix timestamp of change || <code>timestamp</code> || string || int(timestamp) gives you a number with which you can calculate the date, time, day of week, etc.
|}
 
Line 98 ⟶ 123:
 
===Page/Article namespace===
''See also [[Manual:Namespace]]''
{| class="wikitable" style="float: right; margin:1em 0 1em 1em; text-align: center; font-size: 90%;"
! colspan="4" style="background-color: #F0F8FF; font-size:110%;" | [[:en:Wikipedia:Namespace|English Wikipedia namespaces]]
|-
! colspan="2" style="background-color: #F0F8FF;" | Basic namespaces
!Talk namespaces
! colspan="2" style="background-color: #F0F8FF;" | Talk namespaces
|-
| 0
| [[Wikipedia:Wikipedia:Main namespace|Main]]
| [[Wikipedia:Wikipedia:Talk namespace|Talk]]
| 1
|-
| 2
| [[Wikipedia:Wikipedia:User namespace|User]]
| User talk
| 3
|-
| 4
| [[Wikipedia:Wikipedia:Wikipedia namespace|Wikipedia]]
| Wikipedia talk
| 5
|-
| 6
| [[Wikipedia:Wikipedia:File namespace|File]]
| File talk
| 7
|-
| 8
| [[Wikipedia:Wikipedia:MediaWiki namespace|MediaWiki]]
| MediaWiki talk
| 9
|-
| 10
| [[Wikipedia:Wikipedia:Template namespace|Template]]
| Template talk
| 11
|-
| 12
| [[Wikipedia:Wikipedia:Help namespace|Help]]
| Help talk
| 13
|-
| 14
| [[Wikipedia:Wikipedia:Category namespace|Category]]
| Category talk
| 15
|-
| 100
| [[Wikipedia:Wikipedia:Portal namespace|Portal]]
| Portal talk
| 101
|-
| 108
| [[Wikipedia:Wikipedia:Book namespace|Book]]
| Book talk
| 109
|-
! colspan="4" style="background-color: #F0F8FF;" | Virtual namespaces
|-
| -1
|[[Wikipedia:Wikipedia:Special namespace|Special]]
| colspan="3" | [[Wikipedia:Wikipedia:Special namespace|Special]]
|-
| -2
|[[Wikipedia:Wikipedia:Media namespace|Media]]
| colspan="3" | [[Wikipedia:Wikipedia:Media namespace|Media]]
|}
 
Line 137 ⟶ 195:
* <code>==</code> (or <code>=</code>) and <code>!=</code> &mdash; Return true if the left-hand operand is ''equal to/not equal to'' the right-hand operand respectively.
* <code>===</code> and <code>!==</code> &mdash; Return true if the left-hand operand is ''equal to/not equal to'' the right-hand operand AND the left-hand operand is ''the same/not the same'' data type to the right-hand operand respectively.
 
{| class="wikitable"
! style="width: 50%;"| Example
! Result
|-
|<code>1 == 2</code>|| False
|-
|<code>1 <= 2</code>|| True
|-
|<code>1 >= 2</code>|| False
|-
|<code>1 != 2</code>|| True
|-
|<code>1 < 2</code>|| True
|-
|<code>1 > 2</code>|| False
|-
|<code>0 == False</code>|| True
|-
|<code>0 === False</code>|| False
|}
 
=== Arithmetic ===
Line 146 ⟶ 225:
* <code>**</code> — Raise the left-hand operand to the exponential power specified by the right-hand operand.
* <code>%</code> — Return the remainder given when the left-hand operand is divided by the right-hand operand.
 
{| class="wikitable"
!style="width: 50%;"|Example!!|Result
|-
| <code>1 + 1</code> || 2
|-
| <code>2 * 2</code> || 4
|-
| <code>1 / 2</code> || 0.5
|-
| <code>9 ** 2</code>|| 81
|-
| <code>6 % 5</code> || 1
|}
 
=== String concatenation ===
Line 152 ⟶ 245:
=== Keywords ===
The following special keywords are included for often-used functionality:
* <code>like</code> (or <code>matches</code>) returns true if the left-hand operand matches the [[w:en:Glob (programming)#Syntax|glob pattern]] in the right-hand operand.
* <code>in</code> returns true if the right-hand operand (a string) contains the left-hand operand.
* <code>rlike</code> (or <code>regex</code>) and <code>irlike</code> return true if the left-hand operand matches (contains) the [[w:Regular expression|regex]] pattern in the right-hand operand (<code>irlike</code> is case '''i'''nsensitive). The system uses [[w:Perl-compatible regular expressions|PCRE]]. The only PCRE option enabled is <code>PCRE_UTF8</code> (modifier <code>u</code> [//php.net/manual/en/reference.pcre.pattern.modifiers.php in PHP]); for <code>irlike</code> both <code>PCRE_CASELESS</code> and <code>PCRE_UTF8</code> are enabled (modifier <code>iu</code>).
* <code>contains</code>
* <code>if ... then ... else ... end</code>
Line 162 ⟶ 255:
 
'''Examples'''
{| class="wikitable"
|-
! Code
! Result
|-
| <code>"1234" like "12?4"</code>
| True
|-
| <code>"1234" like "12*"</code>
| True
|-
| <code>"foo" in "foobar"</code>
| True
|-
| <code>"foo" regex "\w+"</code>
| True
|}
 
=== Functions ===
A number of built-in functions are included to ease some common issues. They are executed in the general format <code>functionName( arg1, arg2, arg3 )</code>, and can be used in place of any literal or variable. Its arguments can be given as literals, variables, or even other functions.
 
{| class="wikitable sortable"
! name !! description
|-
| <code>lcase</code> || Returns the argument converted to lower case.
|-
| <code>ucase</code> || Returns the argument converted to upper case.
|-
| <code>length</code> || Returns the length of the string given as the argument.
|-
| <code>string</code> || Casts to string data type.
|-
| <code>int</code> || Casts to integer data type.
|-
| <code>float</code> || Casts to floating-point data type.
|-
| <code>bool</code> || Casts to boolean data type.
|-
| <code>norm</code> || Equivalent to <code>rmwhitespace(rmspecials(rmdoubles(ccnorm(arg1))))</code>.
|-
| <code>ccnorm</code> || Normalises confusable/similar characters in the argument, and returns a canonical form. A list of characters and their replacements can be found {{git file|project=mediawiki/extensions/AntiSpoof|branch=HEAD|file=equivset.php|text=on git}}, eg. <code>ccnorm( "Eeèéëēĕėęě3ƐƷ" ) === "EEEEEEEEEEEEE"</code>.<ref name="bug27987">Be aware of [[bugzilla:27987|bug 27987]]</ref><ref name="bug25619">Be aware of [[bugzilla:25619|bug 25619]]</ref>
|-
| <code>specialratio</code> || Returns the number of non-alphanumeric characters divided by the total number of characters in the argument.
|-
| <code>rmspecials</code> || Removes any special characters in the argument, and returns the result.
|-
| <code>rmdoubles</code> || Removes repeated characters in the argument, and returns the result.
|-
| <code>rmwhitespace</code> || Removes whitespace (spaces, tabs, newlines).
|-
| <code>count</code> || Returns the number of times the needle (first string) appears in the haystack (second string). If only one argument is given, splits it by commas and returns the number of segments.
|-
| <code>rcount</code> || Similar to <code>count</code> but the needle uses a regular expression instead. Can be made case-insensitive by letting the regular expression start with "(?i)".
|-
| <code>ip_in_range</code> || Returns true if user's IP (first string) matches specified IP ranges (second string). Only works for anonymous users.
|-
| <code>contains_any</code> || Returns true if the first string contains any strings from the following arguments (unlimited number of arguments).
|-
| <code>substr</code> || Returns the portion of the first string, by offset from the second argument (starts at 0) and maximum length from the third argument (optional).
|-
| <code>strlen</code> || Same as <code>length</code>.
|-
| <code>strpos</code> || Returns the numeric position of the first occurrence of needle (second string) in the haystack (first string). This function may return 0 when the needle is found at the begining of the haystack, so it might be misinterpreted as ''false'' value by another comparative operator. The better way is to use <code>===</code> or <code>!==</code> for testing whether it is found.
|-
| <code>str_replace</code> || Replaces all occurrences of the search string with the replacement string. The function takes 3 arguments in the following order: text to perform the search, text to find, replacement text.
|-
| <code>rescape</code> || Returns the argument with some characters preceded with the escape character "\", so that the string can be used in a regular expression without those characters having a special meaning.
|-
| <code>set</code> || Sets a variable (first string) with a given value (second argument) for further use in the filter. Another syntax: <code>''name'' := ''value''</code>.
|-
| <code>set_var</code> || Same as <code>set</code>.
|}
 
'''Other'''
* <code>convert</code> returns the second argument converted to variant language specified by the first argument. ONLY apply on wikis with LanguageConverter class. (New func added on [[rev:49399]], need support of MediaWiki after [[rev:49397]])
 
'''Examples'''
{| class="wikitable"
|-
| <code>length( "Wikipedia" )</code>
| 9
|-
| <code>lcase( "Wikipedia" )</code>
| wikipedia
|-
| <code>ccnorm( "ωɨƙɩᑭƐƉlα" )</code>
| W1K1PED1A
|-
| <code>ccnorm( "ìíîïĩļǐīĭḷĿї!ľį₤ĺľḷĿΛЛљóòôöõǒōŏǫőόὸὀὁὄὂὅὃọ$śŝşšṣσ" )</code>
| ìíîïĩļǐīĭḷĿї!ľį₤ĺľḷĿΛЛљóòôöõǒōŏǫőόὸὀὁὄὂὅὃọ$śŝşšṣσ<ref name="bug25619" />
<!--
|-
| <code>convert( "zh-hant", "维基百科" )</code><br />// assume we work on a wiki with Chinese LanguageConverter class
| 維基百科
-->
-->=== Boolean operations ===
|-
| <code>rmdoubles( "foobybboo" )</code>
| fobybo
|-
| <code>specialratio( "Wikipedia!" )</code>
| 0.1
|-
| <code>norm( "!!ω..ɨ..ƙ..ɩ..ᑭᑭ..Ɛ.Ɖ@@l%%α!!" )</code>
| W1K1PED1A
|-
| <code>count( "foo", "foofooboofoo" )</code>
| 3
|-
| <code>count( "foo,bar,baz" )</code>
| 3
|-
| <code>rmspecials( "FOOBAR!!1" )</code>
| FOOBAR1
|-
| <code>rescape( "abc* (def)" )</code>
| abc\* \(def\)
|}
 
=== Boolean operations ===
You can match if and only if all of a number of conditions are true, one of a number of conditions are true, or one and only one of all conditions are true.
* <code>x | y</code> &mdash; OR &ndash; returns true if one or more of the conditions is true.
Line 180 ⟶ 380:
* <code>!x</code> &mdash; NOT &ndash; returns true if the condition is not true.
 
'''Examples'''
* '''Examples'''<ref>The last 4 conditions are counted due to [[bugzilla:41693|bug 41693]]</ref>
{| class="wikitable"
|-
! Code
! Result
|-
| <code>1 | 1</code>
| True
|-
| <code>1 | 0</code>
| True
|-
| <code>0 | 0</code>
| False
|-
| <code>1 & 1</code>
| True
|-
| <code>1 & 0</code>
| False
|-
| <code>0 & 0</code>
| False
|-
| <code>1 ^ 1</code>
| False
|-
| <code>1 ^ 0</code>
| True
|-
| <code>0 ^ 0</code>
| False
|-
| <code>!1</code>
| False
|}
 
=== Order of operations ===
Operations are generally done left-to-right, but there is an order to which they are resolved. As soon as the filter fails one of the conditions, it will stop checking the rest of them (due to [[w:short-circuit ev<ref name="bug27987"evaluation|short-circuit />aluationevaluation]]) and move on to the next filter (except for [[bugzilla:41693|bug 41693]]). The evaluation order is:
# Anything surrounded by parentheses (<code>(</code> and <code>)</code>) is evaluated as a single unit.
# Turning variables/literals into their respective data. (i.e., <code>article_namespace</code> to 0)
Line 200 ⟶ 436:
 
== Conditions ==
 
<references/>
{| class="wikitable" style="width: 100%;"
|+ Condition counting
! style="width: 30em;" | Rules
! Conditions used
! Notes
|-
|<code>'foo' == 'bar'</code>|| 1 || rowspan=2 | A simple test counts as one condition
|-
|<code>false & false & false & false & false</code>|| 5<ref>The last 4 conditions are counted due to [[bugzilla:41693|bug 41693]]</ref>
|-
|<code>( 'foo' == 'bar' )</code>|| 2 || rowspan=3 | Evaluating parenthesis also counts as conditions
|-
|<code>( 'foo' ) == ( 'bar' )</code>|| 3
|-
|<code>(((( 'foo' == 'bar' ))))</code>|| 5
|-
|<code>false & ( false & false & false & false )</code>|| 2 || rowspan=2 | But they can be used to force a short-circuit
|-
|<code>false & ( true & true & true & true )</code>|| 2
|-
|<code>true & ( false & false & false & false )</code>|| 6 || rowspan=2 | Rearranging and grouping the conditions according to their likelihood of being true might represent a big difference in the total number of conditions used by a complex filter
|-
|<code>true & ( false & ( false & false & false ) )</code>|| 4
|-
|<code>str_replace( 'FooFoo', 'Foo', <nowiki>''</nowiki> ) == 'bar'</code>|| 5 || Each function call and each parameter evaluation also counts as one condition
|-
|<code>&nbsp;&nbsp;str_replace( 'FooFoo', 'Foo', <nowiki>''</nowiki> ) == 'bar' <br /><nowiki>|</nowiki> str_replace( 'FooFoo', 'Foo', <nowiki>''</nowiki> ) == 'baz'</code>|| 9 || 1 from <code>str_replace</code> + 3 from its parameters + 1 from the first <code>==</code> + 3 for the same parameters + 1 for the second <code>==</code>
|-
|<code>str_replace( 'FooFoo', 'Foo', <nowiki>''</nowiki> )</code>|| 5 || equivalent to "<code>str_replace( 'FooFoo', 'Foo', <nowiki>''</nowiki> ) = 1</code>"
|}
 
== Useful links ==
 
* [http://php.net/manual/en/reference.pcre.pattern.syntax.php PCRE pattern syntax].
 
== Notes ==
<references/>
 
{{languages|Extension:AbuseFilter/Rules format}}
 
[[zh:Wikipedia:防滥用过滤器/操作指引]]
__FORCETOC__
__NOEDITSECTION__
__DISAMBIG__
__INDEX__
__NEWSECTIONLINK__
0

edits