Модификаторы шаблонов

Ниже перечислены все доступные на сегодняшний день модификаторы. Имя, взятое в круглые скобки, указывает внутреннее PCRE имя для данного модификатора. Пробелы и переводы строк в модификаторах игнорируются, другие символы вызывают ошибки.

i (PCRE_CASELESS)

Если этот модификатор используется, символы в шаблоне соответствуют символам как верхнего, так и нижнего регистра.

m (PCRE_MULTILINE)

По умолчанию PCRE обрабатывает данные как однострочную символьную строку (даже если она содержит несколько разделителей строк). Метасимвол начала строки '^' соответствует только началу обрабатываемого текста, в то время как метасимвол "конец строки" '$' соответствует концу текста, либо позиции перед завершающим текст переводом строки (в случае, если модификатор D не установлен). В Perl ситуация полностью аналогична. Если этот модификатор используется, метасимволы "начало строки" и "конец строки" также соответствуют позициям перед произвольным символом перевода и строки и, соответственно, после, как и в самом начале и в самом конце строки. Это соответствует Perl-модификатору /m. В случае, если обрабатываемый текст не содержит символов перевода строки, либо шаблон не содержит метасимволов '^' или '$', данный модификатор не имеет никакого эффекта.

s (PCRE_DOTALL)

Если данный модификатор используется, метасимвол "точка" в шаблоне соответствует всем символам, включая перевод строк. Без него - всем, за исключением переводов строк. Этот модификатор эквивалентен записи /s в Perl. Класс символов, построенный на отрицании, например [^a], всегда соответствует переводу строки, независимо от наличия этого модификатора.

x (PCRE_EXTENDED)

Если используется данный модификатор, неэкранированные пробелы, символы табуляции и пустой строки будут проигнорированы в шаблоне, если они не являются частью символьного класса. Также игнорируются все символы между неэкранированным символом '#' (если он не является частью символьного класса) и символом перевода строки (включая сами символы '\n' и '#'). Это эквивалентно Perl-модификатору /x, и позволяет размещать комментарий в сложных шаблонах. Замечание: это касается только символьных данных. Пробельные символы не фигурируют в служебных символьных последовательностях, к примеру, в последовательности '(?(', открывающей условную подмаску.

e (PREG_REPLACE_EVAL)

Если используется данный модификатор, preg_replace() после выполнения стандартных подстановок в заменяемой строке интерпретирует ее как PHP-код и использует результат для замены искомой строки. Одинарные и двойные кавычки, обратные слэши (\) NULL-символы будут проэкранированы обратными слэшами в подставляемых обратных ссылках.
Предостережение
Прежде чем подставить значение обратной ссылки, к этому значению применяется функция addslashes. Таким образом, если обратная ссылка используется как строка в кавычках, то экранированные символы будут преобразованы в строковые константы. Однако проэкранированные символы, которые обычно не должны изменяться, сохраняют примененные к ним слэши. Это делает использование данного модификатора довольно проблематичным.

Предостережение
Убедитесь, что параметр replacement содержит строку с корректным PHP-кодом, иначе PHP сообщит об ошибке парсинга на строке, содержащей вызов preg_replace().

Предостережение
Использовать этот модификатор не рекомендуется, так как это может легко добавить уязвимости в системе безопасности:

<?php $html = $_POST['html']; // Заголовки в верхнем регистре $html = preg_replace( '(<h([1-6])>(.*?)</h\1>)e', '"<h$1>" . strtoupper("$2") . "</h$1>"', $html );

Приведенный выше код может быть легко скомпрометирован путем передачи такой строки как <h1>{${eval($_GET[php_code])}}</h1>. Это дает атакующему возможность исполнить произвольный PHP-код и по существу предоставляет ему почти полный доступ к вашему серверу.

Для предотвращения этого типа уязвимости с удаленным исполнением кода следует использовать функцию preg_replace_callback():

<?php $html = $_POST['html']; // Заголовки в верхнем регистре $html = preg_replace_callback( '(<h([1-6])>(.*?)</h\1>)', function ($m) { return "<h$m[1]>" . strtoupper($m[2]) . "</h$m[1]>"; }, $html );

Замечание:
Этот модификатор используется только в функции preg_replace(), в других PCRE функциях он игнорируется.

A (PCRE_ANCHORED)

Если используется данный модификатор, соответствие шаблону будет достигаться только в том случае, если он "заякорен", т.е. соответствует началу строки, в которой производится поиск. Того же эффекта можно достичь подходящей конструкцией с вложенным шаблоном, которая является единственным способом реализации этого поведения в Perl.

D (PCRE_DOLLAR_ENDONLY)

Если используется данный модификатор, метасимвол $ в шаблоне соответствует только окончанию обрабатываемых данных. Без этого модификатора метасимвол $ соответствует также позиции перед последним символом, в случае, если им является перевод строки (но не распространяется на любые другие переводы строк). Данный модификатор игнорируется, если используется модификатор m. В языке Perl аналогичный модификатор отсутствует.

S

В случае, если планируется многократно использовать шаблон, имеет смысл потратить немного больше времени на его анализ, чтобы уменьшить время его выполнения. В случае, если данный модификатор используется, проводится дополнительный анализ шаблона. В настоящем это имеет смысл только для "незаякоренных" шаблонов, не начинающихся с какого-либо определенного символа.

U (PCRE_UNGREEDY)

Этот модификатор инвертирует жадность квантификаторов, таким образом они по умолчанию не жадные. Но становятся жадными, если за ними следует символ ?. Такая возможность не совместима с Perl. Его также можно установить с помощью (?U) установки модификатора внутри шаблона или добавив знак вопроса после квантификатора (например, .*?).
Замечание:
В нежадном режиме обычно невозможно совпадение символов превышающих pcre.backtrack_limit.

X (PCRE_EXTRA)

Этот модификатор включает дополнительную функциональность PCRE, которая не совместима с Perl: любой обратный слэш в шаблоне, за которым следует символ, не имеющий специального значения, приводят к ошибке. Это обусловлено тем, что подобные комбинации зарезервированы для дальнейшего развития. По умолчанию же, как и в Perl, слэш со следующим за ним символом без специального значения трактуется как опечатка. На сегодняшний день это все возможности, которые управляются данным модификатором

J (PCRE_INFO_JCHANGED)

Модификатор (?J) меняет значение локальной опции PCRE_DUPNAMES - подшаблоны могут иметь одинковые имена.

u (PCRE_UTF8)

Этот модификатор включает дополнительную функциональность PCRE, которая не совместима с Perl: шаблоны обрабатываются как UTF-8 строки. Модификатор u доступен в PHP 4.1.0 и выше для Unix-платформ, и в PHP 4.2.3 и выше для Windows платформ. Валидность UTF-8 в шаблоне проверяется начиная с PHP 4.3.5.

Коментарии

Jul 15

Автор: hfuecks at nospam dot org


Regarding the validity of a UTF-8 string when using the /u pattern modifier, some things to be aware of;



1. If the pattern itself contains an invalid UTF-8 character, you get an error (as mentioned in the docs above - "UTF-8 validity of the pattern is checked since PHP 4.3.5"



2. When the subject string contains invalid UTF-8 sequences / codepoints, it basically result in a "quiet death" for the preg_* functions, where nothing is matched but without indication that the string is invalid UTF-8



3. PCRE regards five and six octet UTF-8 character sequences as valid (both in patterns and the subject string) but these are not supported in Unicode ( see section 5.9 "Character Encoding" of the "Secure Programming for Linux and Unix HOWTO" - can be found at http://www.tldp.org/ and other places )



4. For an example algorithm in PHP which tests the validity of a UTF-8 string (and discards five / six octet sequences) head to: http://hsivonen.iki.fi/php-utf8/



The following script should give you an idea of what works and what doesn't;



<?php

$examples = array(

    'Valid ASCII' => "a",

    'Valid 2 Octet Sequence' => "\xc3\xb1",

    'Invalid 2 Octet Sequence' => "\xc3\x28",

    'Invalid Sequence Identifier' => "\xa0\xa1",

    'Valid 3 Octet Sequence' => "\xe2\x82\xa1",

    'Invalid 3 Octet Sequence (in 2nd Octet)' => "\xe2\x28\xa1",

    'Invalid 3 Octet Sequence (in 3rd Octet)' => "\xe2\x82\x28",



    'Valid 4 Octet Sequence' => "\xf0\x90\x8c\xbc",

    'Invalid 4 Octet Sequence (in 2nd Octet)' => "\xf0\x28\x8c\xbc",

    'Invalid 4 Octet Sequence (in 3rd Octet)' => "\xf0\x90\x28\xbc",

    'Invalid 4 Octet Sequence (in 4th Octet)' => "\xf0\x28\x8c\x28",

    'Valid 5 Octet Sequence (but not Unicode!)' => "\xf8\xa1\xa1\xa1\xa1",

    'Valid 6 Octet Sequence (but not Unicode!)' => "\xfc\xa1\xa1\xa1\xa1\xa1",

);



echo "++Invalid UTF-8 in pattern\n";

foreach ( $examples as $name => $str ) {

    echo "$name\n";

    preg_match("/".$str."/u",'Testing');

}



echo "++ preg_match() examples\n";

foreach ( $examples as $name => $str ) {

    

    preg_match("/\xf8\xa1\xa1\xa1\xa1/u", $str, $ar);

    echo "$name: ";



    if ( count($ar) == 0 ) {

        echo "Matched nothing!\n";

    } else {

        echo "Matched {$ar[0]}\n";

    }

    

}



echo "++ preg_match_all() examples\n";

foreach ( $examples as $name => $str ) {

    preg_match_all('/./u', $str, $ar);

    echo "$name: ";

    

    $num_utf8_chars = count($ar[0]);

    if ( $num_utf8_chars == 0 ) {

        echo "Matched nothing!\n";

    } else {

        echo "Matched $num_utf8_chars character\n";

    }

    

}

?>

2005-07-15 10:14:26

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Nov 03

Автор: varrah NO_GARBAGE_OR_SPAM AT mail DOT ru


Spent a few days, trying to understand how to create a pattern for Unicode chars, using the hex codes. Finally made it, after reading several manuals, that weren't giving any practical PHP-valid examples. So here's one of them:



For example we would like to search for Japanese-standard circled numbers 1-9 (Unicode codes are 0x2460-0x2468) in order to make it through the hex-codes the following call should be used:

preg_match('/[\x{2460}-\x{2468}]/u', $str);



Here $str is a haystack string

\x{hex} - is an UTF-8 hex char-code

and /u is used for identifying the class as a class of Unicode chars.



Hope, it'll be useful.

2005-11-03 06:12:40

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Feb 06

Автор: ebarnard at marathonmultimedia dot com


When adding comments with the /x modifier, don't use the pattern delimiter in the comments. It may not be ignored in the comments area. Example:



<?php

$target = 'some text';

if(preg_match('/

                e # Comments here

               /x',$target)) {

    print "Target 1 hit.\n";

}

if(preg_match('/

                e # /Comments here with slash

               /x',$target)) {

    print "Target 1 hit.\n";

}

?>



prints "Target 1 hit." but then generates a PHP warning message for the second preg_match():



Warning:  preg_match() [function.preg-match]: Unknown modifier 'C' in /ebarnard/x-modifier.php on line 11

2007-02-06 16:35:52

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

May 18

Автор: michal dot kocarek at brainbox dot cz


In case you're wondering, what is the meaning of "S" modifier, this paragraph might be useful:



When "S" modifier is set, PHP calls the pcre_study() function from the PCRE API before executing the regexp. Result from the function is passed directly to pcre_exec().



For more information about pcre_study() and "Studying the pattern" check the PCRE manual on http://www.pcre.org/pcre.txt



PS: Note that function names "pcre_study" and "pcre_exec" used here refer to PCRE library functions written in C language and not to any PHP functions.

2009-05-18 19:49:19

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Apr 08

Автор: phpman at crustynet dot org dot uk


The description of the "u" flag is a bit misleading. It suggests that it is only required if the pattern contains UTF-8 characters, when in fact it is required if either the pattern or the subject contain UTF-8. Without it, I was having problems with preg_match_all returning invalid multibyte characters when given a UTF-8 subject string.



It's fairly clear if you read the documentation for libpcre:



       In  order  process  UTF-8 strings, you must build PCRE to include UTF-8

       support in the code, and, in addition,  you  must  call  pcre_compile()

       with  the  PCRE_UTF8  option  flag,  or the pattern must start with the

       sequence (*UTF8). When either of these is the case,  both  the  pattern

       and  any  subject  strings  that  are matched against it are treated as

       UTF-8 strings instead of strings of 1-byte characters.



[from http://www.pcre.org/pcre.txt]

2011-04-08 11:03:58

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Feb 14

Автор: Daniel Klein


If the _subject_ contains utf-8 sequences the 'u' modifier should be set, otherwise a pattern such as /./ could match a utf-8 sequence as two to four individual ASCII characters. It is not a requirement, however, as you may have a need to break apart utf-8 sequences into single bytes. Most of the time, though, if you're working with utf-8 strings you should use the 'u' modifier.



If the subject doesn't contain any utf-8 sequences (i.e. characters in the range 0x00-0x7F only) but the pattern does, as far as I can work out, setting the 'u' modifier would have no effect on the result.

2012-02-14 01:40:13

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Aug 22

Автор: arash dot dalir at gmail dot com


the PCRE_INFO_JCHANGED modifier is apparently not accepted as a global option (after the closing delimiter) in PHP versions <= 5.4 (not checked in PHP 5.5) but allowed in PHP 5.6 (also not checked in PHP 7.X)



The following pattern doesn't work in PHP 5.4, but it works in PHP 5.6:



<?php

//test.php

preg_match_all('/(?<dup_name>\d{1,4})\-(?<dup_name>\d{1,2})/J', '1234-23', $matches);

var_dump($matches);



/*

output in PHP 5.4:

Warning: preg_match_all(): Unknown modifier 'J' in test.php on line 3

NULL

--------------

output PHP 5.6:

array(4) { 

    [0]=> array(1)  { [0]=> string(7) "1234-23" } 

    ["dup_name"]=> array(1) { [0]=> string(2) "23" } 

    [1]=> array(1) { [0]=> string(4) "1234" } 

    [2]=> array(1) { [0]=> string(2) "23" } 

}

*/

?>



in order to resolve this issue in PHP 5.4, one can use the (?J) pattern modifier, which indicates the pattern (from that point forward) allows duplicate names for subpatterns.



code which works in PHP 5.4:

<?php



preg_match_all('/(?J)(?<dup_name>\d{1,4})\-(?<dup_name>\d{1,2})/', '1234-23', $matches);

var_dump($matches);



/*

output in PHP 5.4:

array(4) { 

    [0]=> array(1) { [0]=> string(7) "1234-23" } 

    ["dup_name"]=> array(1) { [0]=> string(2) "23" } 

    [1]=> array(1) { [0]=> string(4) "1234" } 

    [2]=> array(1) { [0]=> string(2) "23" } 

}

--------------

output in PHP 5.6 (the same as with /J):

array(4) { 

    [0]=> array(1)  { [0]=> string(7) "1234-23" } 

    ["dup_name"]=> array(1) { [0]=> string(2) "23" } 

    [1]=> array(1) { [0]=> string(4) "1234" } 

    [2]=> array(1) { [0]=> string(2) "23" } 

}

*/

?>

2017-08-22 14:54:35

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Feb 20

Автор: Wirek


A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).

<?php 

// Various OS-es have various end line (a.k.a line break) chars:

// - Windows uses CR+LF (\r\n);

// - Linux LF (\n);

// - OSX CR (\r).

// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?).

$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";

//          C          3                   p          0                   _

$pat1='/\w$/mi';    // This works excellent in JavaScript (Firefox 7.0.1+)

$pat2='/\w\r?$/mi';

$pat3='/\w\R?$/mi';    // Somehow disappointing according to php.net and pcre.org

$pat4='/\w\v?$/mi';

$pat5='/(*ANYCRLF)\w$/mi';    // Excellent but undocumented on php.net at the moment

$n=preg_match_all($pat1, $str, $m1);

$o=preg_match_all($pat2, $str, $m2);

$p=preg_match_all($pat3, $str, $m3);

$r=preg_match_all($pat4, $str, $m4);

$s=preg_match_all($pat5, $str, $m5);

echo $str."\n1 !!! $pat1 ($n): ".print_r($m1[0], true)

    ."\n2 !!! $pat2 ($o): ".print_r($m2[0], true)

    ."\n3 !!! $pat3 ($p): ".print_r($m3[0], true)

    ."\n4 !!! $pat4 ($r): ".print_r($m4[0], true)

    ."\n5 !!! $pat5 ($s): ".print_r($m5[0], true);

// Note the difference among the three very helpful escape sequences in $pat2 (\r), $pat3 (\R), $pat4 (\v) and altered newline option in $pat5 ((*ANYCRLF)) - for some applications at least.



/* The code above results in the following output:

ABC ABC



123 123

def def

nop nop

890 890

QRS QRS



~-_ ~-_

1 !!! /\w$/mi (3): Array

(

    [0] => C

    [1] => 0

    [2] => _

)



2 !!! /\w\r?$/mi (5): Array

(

    [0] => C

    [1] => 3

    [2] => p

    [3] => 0

    [4] => _

)



3 !!! /\w\R?$/mi (5): Array

(

    [0] => C



    [1] => 3

    [2] => p

    [3] => 0

    [4] => _

) 



4 !!! /\w\v?$/mi (5): Array

(

    [0] => C



    [1] => 3

    [2] => p

    [3] => 0

    [4] => _

)



5 !!! /(*ANYCRLF)\w$/mi (7): Array

(

    [0] => C

    [1] => 3

    [2] => f

    [3] => p

    [4] => 0

    [5] => S

    [6] => _

)

 */

?>

Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.

2018-02-20 15:18:14

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Feb 23

Автор: Wirek


An important addendum (with new $pat3_2 utilising \R properly, its results and comments):

Note that there are (sometimes difficult to grasp at first glance) nuances of meaning and application of escape sequences like \r, \R and \v - none of them is perfect in all situations, but they are quite useful nevertheless. Some official PCRE control options and their changes come in handy too - unfortunately neither (*ANYCRLF), (*ANY) nor (*CRLF) is documented here on php.net at the moment (although they seem to be available for over 10 years and 5 months now), but they are described on Wikipedia ("Newline/linebreak options" at https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions) and official PCRE library site ("Newline convention" at http://www.pcre.org/original/doc/html/pcresyntax.html#SEC17) pretty well. The functionality of \R appears somehow disappointing (with default configuration of compile time option) according to php.net as well as official description ("Newline sequences" at https://www.pcre.org/original/doc/html/pcrepattern.html#newlineseq) when used improperly.



A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end (or at the beginning) of any line even without the multiple lines mode (/m) or meta-character assertions ($ or ^).

<?php 

// Various OS-es have various end line (a.k.a line break) chars:

// - Windows uses CR+LF (\r\n);

// - Linux LF (\n);

// - OSX CR (\r).

// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?) of default configuration option for meta-character assertions (^ and $) at compile time of PCRE.

$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";

//          C          3                   p          0                   _

$pat3='/\w\R?$/mi';    // Somehow disappointing according to php.net and pcre.org when used improperly

$pat3_2='/\w(?=\R)/i';    // Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected, but '/(*ANYCRLF)\w$/mi' is more straightforward in use anyway

$p=preg_match_all($pat3, $str, $m3);

$r=preg_match_all($pat3_2, $str, $m4);

echo $str."\n3 !!! $pat3 ($p): ".print_r($m3[0], true)

    ."\n3_2 !!! $pat3_2 ($r): ".print_r($m4[0], true);

// Note the difference between the two very helpful escape sequences in $pat3 and $pat3_2 (\R) - for some applications at least.



/* The code above results in the following output:

ABC ABC



123 123

def def

nop nop

890 890

QRS QRS



~-_ ~-_

3 !!! /\w\R?$/mi (5): Array

(

    [0] => C



    [1] => 3

    [2] => p

    [3] => 0

    [4] => _

)



3_2 !!! /\w(?=\R)/i (6): Array

(

    [0] => C

    [1] => 3

    [2] => f

    [3] => p

    [4] => 0

    [5] => S

)

 */

?>

Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.

2018-02-23 14:05:45

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Jul 22

Автор: Anonymous


A warning about the /i modifier and POSIX character classes:

If you're using POSIX character classes in your regex that indicate case such as [:upper:] or [:lower:] in combination with the /i modifier, then in PHP < 7.3 the /i modifier will take precedence and effectively make both those character classes work as [:alpha:], but in PHP >= 7.3 the character classes overrule the /i modifier.

2019-07-22 15:37:19

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Jun 30

Автор: Hayley Watson


Starting from 7.3.0, the 'S' modifier has no effect; this analysis is now always done by the PCRE engine.

2020-06-30 04:01:07

http://php5.kiev.ua/manual/ru/reference.pcre.pattern.modifiers.html

Производительность

Differences From Perl

Регулярные выражения PCRE

PHP Manual

PHP5

Для web разработчика

Jul 14
Модификаторы шаблонов

Модификаторы шаблонов

Коментарии

PHP5

Для web разработчика

Jul 14Модификаторы шаблонов

Модификаторы шаблонов

Коментарии

Jul 14
Модификаторы шаблонов