preg_match_all

(PHP 4, PHP 5)

preg_match_all — Perform a global regular expression match

Description

int preg_match_all ( string $pattern , string $subject [, array &$matches [, int $flags = PREG_PATTERN_ORDER [, int $offset = 0 ]]] )

Searches subject for all matches to the regular expression given in pattern and puts them in matches in the order specified by flags.

After the first match is found, the subsequent searches are continued on from end of the last match.

Parameters

pattern

The pattern to search for, as a string.

subject

The input string.

matches

Array of all matches in multi-dimensional array ordered according to flags.

flags

Can be a combination of the following flags (note that it doesn't make sense to use PREG_PATTERN_ORDER together with PREG_SET_ORDER):

PREG_PATTERN_ORDER

Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.


<?php
preg_match_all("|<[^>]+>(.*)</[^>]+>|U",
    "<b>example: </b><div align=left>this is a test</div>",
    $out, PREG_PATTERN_ORDER);
echo $out[0][0] . ", " . $out[0][1] . "\n";
echo $out[1][0] . ", " . $out[1][1] . "\n";
?>

The above example will output:

<b>example: </b>, <div align=left>this is a test</div>
example: , this is a test

So, $out[0] contains array of strings that matched full pattern, and $out[1] contains array of strings enclosed by tags.

PREG_SET_ORDER

Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.


<?php
preg_match_all("|<[^>]+>(.*)</[^>]+>|U",
    "<b>example: </b><div align=\"left\">this is a test</div>",
    $out, PREG_SET_ORDER);
echo $out[0][0] . ", " . $out[0][1] . "\n";
echo $out[1][0] . ", " . $out[1][1] . "\n";
?>

The above example will output:

<b>example: </b>, example:
<div align="left">this is a test</div>, this is a test

PREG_OFFSET_CAPTURE

If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.

If no order flag is given, PREG_PATTERN_ORDER is assumed.

offset

Normally, the search starts from the beginning of the subject string. The optional parameter offset can be used to specify the alternate place from which to start the search (in bytes).

Note:
Using offset is not equivalent to passing substr($subject, $offset) to preg_match_all() in place of the subject string, because pattern can contain assertions such as ^, $ or (?<=x). See preg_match() for examples.

Return Values

Returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.

Changelog

Version	Description
5.4.0	The `matches` parameter became optional.
5.3.6	Returns `FALSE` if `offset` is higher than `subject` length.
5.2.2	Named subpatterns now accept the syntax (?<name>) and (?'name') as well as (?P<name>). Previous versions accepted only (?P<name>).
4.3.3	The `offset` parameter was added
4.3.0	The `PREG_OFFSET_CAPTURE` flag was added

Examples

Example #1 Getting all phone numbers out of some text.


<?php
preg_match_all("/\(?  (\d{3})?  \)?  (?(1)  [\-\s] ) \d{3}-\d{4}/x",
                "Call 555-1212 or 1-800-555-1212", $phones);
?>

Example #2 Find matching HTML tags (greedy)


<?php
// The \\2 is an example of backreferencing. This tells pcre that
// it must match the second set of parentheses in the regular expression
// itself, which would be the ([\w]+) in this case. The extra backslash is
// required because the string is in double quotes.
$html = "<b>bold text</b><a href=howdy.html>click me</a>";

preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

foreach ($matches as $val) {
    echo "matched: " . $val[0] . "\n";
    echo "part 1: " . $val[1] . "\n";
    echo "part 2: " . $val[2] . "\n";
    echo "part 3: " . $val[3] . "\n";
    echo "part 4: " . $val[4] . "\n\n";
}
?>

The above example will output:

matched: <b>bold text</b>
part 1: <b>
part 2: b
part 3: bold text
part 4: </b>

matched: <a href=howdy.html>click me</a>
part 1: <a href=howdy.html>
part 2: a
part 3: click me
part 4: </a>

Example #3 Using named subpattern


<?php

$str = <<<FOO
a: 1
b: 2
c: 3
FOO;

preg_match_all('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches);

/* This also works in PHP 5.2.2 (PCRE 7.0) and later, however 
 * the above form is recommended for backwards compatibility */
// preg_match_all('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);

print_r($matches);

?>

The above example will output:

Array
(
    [0] => Array
        (
            [0] => a: 1
            [1] => b: 2
            [2] => c: 3
        )

    [name] => Array
        (
            [0] => a
            [1] => b
            [2] => c
        )

    [1] => Array
        (
            [0] => a
            [1] => b
            [2] => c
        )

    [digit] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )

    [2] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )

)

Коментарии

Feb 03

Автор: mnc at u dot nu


PREG_OFFSET_CAPTURE always seems to provide byte offsets, rather than character position offsets, even when you are using the unicode /u modifier.

2006-02-03 00:05:14

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Feb 20

Автор: phpnet at sinful-music dot com


Here's some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as 'email'). I wouldn't suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I'll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements!



<?php

function mime_extract_rfc2822_address($string)

{

        //rfc2822 token setup

        $crlf           = "(?:\r\n)";

        $wsp            = "[\t ]";

        $text           = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]";

        $quoted_pair    = "(?:\\\\$text)";

        $fws            = "(?:(?:$wsp*$crlf)?$wsp+)";

        $ctext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F" .

                          "!-'*-[\\]-\\x7F]";

        $comment        = "(\\((?:$fws?(?:$ctext|$quoted_pair|(?1)))*" .

                          "$fws?\\))";

        $cfws           = "(?:(?:$fws?$comment)*(?:(?:$fws?$comment)|$fws))";

        //$cfws           = $fws; //an alternative to comments

        $atext          = "[!#-'*+\\-\\/0-9=?A-Z\\^-~]";

        $atom           = "(?:$cfws?$atext+$cfws?)";

        $dot_atom_text  = "(?:$atext+(?:\\.$atext+)*)";

        $dot_atom       = "(?:$cfws?$dot_atom_text$cfws?)";

        $qtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!#-[\\]-\\x7F]";

        $qcontent       = "(?:$qtext|$quoted_pair)";

        $quoted_string  = "(?:$cfws?\"(?:$fws?$qcontent)*$fws?\"$cfws?)";

        $dtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!-Z\\^-\\x7F]";

        $dcontent       = "(?:$dtext|$quoted_pair)";

        $domain_literal = "(?:$cfws?\\[(?:$fws?$dcontent)*$fws?]$cfws?)";

        $domain         = "(?:$dot_atom|$domain_literal)";

        $local_part     = "(?:$dot_atom|$quoted_string)";

        $addr_spec      = "($local_part@$domain)";

        $display_name   = "(?:(?:$atom|$quoted_string)+)";

        $angle_addr     = "(?:$cfws?<$addr_spec>$cfws?)";

        $name_addr      = "(?:$display_name?$angle_addr)";

        $mailbox        = "(?:$name_addr|$addr_spec)";

        $mailbox_list   = "(?:(?:(?:(?<=:)|,)$mailbox)+)";

        $group          = "(?:$display_name:(?:$mailbox_list|$cfws)?;$cfws?)";

        $address        = "(?:$mailbox|$group)";

        $address_list   = "(?:(?:^|,)$address)+";



        //output length of string (just so you see how f**king long it is)

        echo(strlen($address_list) . " ");



        //apply expression

        preg_match_all("/^$address_list$/", $string, $array, PREG_SET_ORDER);



        return $array;

};

?>

2006-02-20 02:53:03

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Dec 06

Автор: chuckie


This is a function to convert byte offsets into (UTF-8) character offsets (this is reagardless of whether you use /u modifier:



<?php



function mb_preg_match_all($ps_pattern, $ps_subject, &$pa_matches, $pn_flags = PREG_PATTERN_ORDER, $pn_offset = 0, $ps_encoding = NULL) {

  // WARNING! - All this function does is to correct offsets, nothing else:

  //

  if (is_null($ps_encoding))

    $ps_encoding = mb_internal_encoding();



  $pn_offset = strlen(mb_substr($ps_subject, 0, $pn_offset, $ps_encoding));

  $ret = preg_match_all($ps_pattern, $ps_subject, $pa_matches, $pn_flags, $pn_offset);



  if ($ret && ($pn_flags & PREG_OFFSET_CAPTURE))

    foreach($pa_matches as &$ha_match)

      foreach($ha_match as &$ha_match)

        $ha_match[1] = mb_strlen(substr($ps_subject, 0, $ha_match[1]), $ps_encoding);

    //

    // (code is independent of PREG_PATTER_ORDER / PREG_SET_ORDER)



  return $ret;

  }



?>

2006-12-06 08:20:42

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jun 27

Автор: phektus at gmail dot com


If you'd like to include DOUBLE QUOTES on a regular expression for use with preg_match_all, try ESCAPING THRICE, as in: \\\"



For example, the pattern:

'/<table>[\s\w\/<>=\\\"]*<\/table>/'



Should be able to match:

<table>

<row>

<col align="left" valign="top">a</col>

<col align="right" valign="bottom">b</col>

</row>

</table>

.. with all there is under those table tags.



I'm not really sure why this is so, but I tried just the double quote and one or even two escape characters and it won't work. In my frustration I added another one and then it's cool.

2007-06-27 02:22:21

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jul 12

Автор: mr davin


<?php

// Returns an array of strings where the start and end are found

    function findinside($start, $end, $string) {

        preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $string, $m);

        return $m[1];

    }

    

    $start = "mary has";

    $end = "lambs.";

    $string = "mary has 6 lambs. phil has 13 lambs. mary stole phil's lambs. now mary has all the lambs.";



    $out = findinside($start, $end, $string);



    print_r ($out);



/* Results in 

(

    [0] =>  6 

    [1] =>  all the 

)

*/ 

?>

2007-07-12 17:57:51

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jan 28

Автор: dolbegraeb


please note, that the function of "mail at SPAMBUSTER at milianw dot de" can result in invalid xhtml in some cases. think i used it in the right way but my result is sth like this:



<img src="./img.jpg" alt="nice picture" />foo foo foo foo </img>



correct me if i'm wrong. 

i'll see when there's time to fix that. -.-

2008-01-28 18:30:06

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Mar 04

Автор: bruha


To count str_length in UTF-8 string i use



$count = preg_match_all("/[[:print:]\pL]/u", $str, $pockets);



where

[:print:] - printing characters, including space

\pL - UTF-8 Letter

/u - UTF-8 string

other unicode character properties on http://www.pcre.org/pcre.txt

2008-03-04 02:13:21

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Apr 21

Автор: spambegone at cratemedia dot com


I found simpleXML to be useful only in cases where the XML was extremely small, otherwise the server would run out of memory (I suspect there is a memory leak or something?). So while searching for alternative parsers, I decided to try a simpler approach. I don't know how this compares with cpu usage, but I know it works with large XML structures. This is more a manual method, but it works for me since I always know what structure of data I will be receiving. 



Essentially I just preg_match() unique nodes to find the values I am looking for, or I preg_match_all to find multiple nodes. This puts the results in an array and I can then process this data as I please.



I was unhappy though, that preg_match_all() stores the data twice (requiring twice the memory), one array for all the full pattern matches, and one array for all the sub pattern matches. You could probably write your own function that overcame this. But for now this works for me, and I hope it saves someone else some time as well.



// SAMPLE XML

<RETS ReplyCode="0" ReplyText="Operation Successful">

  <COUNT Records="14" />

  <DELIMITER value="09" />

  <COLUMNS>PropertyID</COLUMNS>

  <DATA>521897</DATA>

  <DATA>677208</DATA>

  <DATA>686037</DATA>

</RETS>



<?PHP



// SAMPLE FUNCTION

function parse_xml($xml) {

    

    

    // GET DELIMITER (single instance)

    $match_res = preg_match('/<DELIMITER value ?= ?"(.*)" ?\/>/', $xml, $matches);

    if(!empty($matches[1])) {

        $results["delimiter"] = chr($matches[1]);

    } else {

        // DEFAULT DELIMITER

        $results["delimiter"] = "\t";

    }

    unset($match_res, $matches);

    

    

    // GET MULTIPLE DATA NODES (multiple instances)

    $results["data_count"] = preg_match_all("/<DATA>(.*)<\/DATA>/", $xml, $matches);

    // GET MATCHES OF SUB PATTERN, DISCARD THE REST

    $results["data"]=$matches[1];

    unset($match_res, $matches);

    

    // UNSET XML TO SAVE MEMORY (should unset outside the function as well)

    unset($xml);



    // RETURN RESULTS ARRAY

    return $results;

    

    

}



?>

2008-04-21 02:39:55

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jun 19

Автор: sledge NOSPAM


Perhaps you want to find the positions of all anchor tags.  This will return a two dimensional array of which the starting and ending positions will be returned.



<?php

function getTagPositions($strBody)

{

    define(DEBUG, false);

    define(DEBUG_FILE_PREFIX, "/tmp/findlinks_");

    

    preg_match_all("/<[^>]+>(.*)<\/[^>]+>/U", $strBody, $strTag, PREG_PATTERN_ORDER);

    $intOffset = 0;

    $intIndex = 0;

    $intTagPositions = array();



    foreach($strTag[0] as $strFullTag) {

        if(DEBUG == true) {

            $fhDebug = fopen(DEBUG_FILE_PREFIX.time(), "a");

            fwrite($fhDebug, $fulltag."\n");

            fwrite($fhDebug, "Starting position: ".strpos($strBody, $strFullTag, $intOffset)."\n");

            fwrite($fhDebug, "Ending position: ".(strpos($strBody, $strFullTag, $intOffset) + strlen($strFullTag))."\n");

            fwrite($fhDebug, "Length: ".strlen($strFullTag)."\n\n");

            fclose($fhDebug);

        }

        $intTagPositions[$intIndex] = array('start' => (strpos($strBody, $strFullTag, $intOffset)), 'end' => (strpos($strBody, $strFullTag, $intOffset) + strlen($strFullTag)));

        $intOffset += strlen($strFullTag);

        $intIndex++;

    }

    return $intTagPositions;

}



$strBody = 'I have lots of <a href="http://my.site.com">links</a> on this <a href="http://my.site.com">page</a> that I want to <a href="http://my.site.com">find</a> the positions.';



$strBody = strip_tags(html_entity_decode($strBody), '<a>');

$intTagPositions = getTagPositions($strBody);

print_r($intTagPositions);



/*****

Output:



Array ( 

    [0] => Array ( 

        [start] => 15 

        [end] => 53 ) 

    [1] => Array ( 

        [start] => 62 

        [end] => 99 ) 

    [2] => Array ( 

        [start] => 115 

        [end] => 152 )

 ) 

*****/

?>

2008-06-19 16:46:28

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Oct 07

Автор: MonkeyMan


Here is a way to match everything on the page, performing an action for each match as you go. I had used this idiom in other languages, where its use is customary, but in PHP it seems to be not quite as common.



<?php

function custom_preg_match_all($pattern, $subject)

{

    $offset = 0;

    $match_count = 0;

    while(preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, $offset))

    {

        // Increment counter

        $match_count++;

    

        // Get byte offset and byte length (assuming single byte encoded)

        $match_start = $matches[0][1];

        $match_length = strlen(matches[0][0]);



        // (Optional) Transform $matches to the format it is usually set as (without PREG_OFFSET_CAPTURE set)

        foreach($matches as $k => $match) $newmatches[$k] = $match[0];

        $matches = $new_matches;

    

        // Your code here

        echo "Match number $match_count, at byte offset $match_start, $match_length bytes long: ".$matches[0]."\r\n";

            

        // Update offset to the end of the match

        $offset = $match_start + $match_length;

    }



    return $match_count;

}

?>



Note that the offsets returned are byte values (not necessarily number of characters) so you'll have to make sure the data is single-byte encoded. (Or have a look at paolo mosna's strByte function on the strlen manual page).

I'd be interested to know how this method performs speedwise against using preg_match_all and then recursing through the results.

2008-10-07 04:25:53

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Oct 15

Автор: meaneye at mail dot com


Recently I had to write search engine in hebrew and ran into huge amount of problems. My data was stored in MySQL table with utf8_bin encoding.



So, to be able to write hebrew in utf8 table you need to do

<?php

$prepared_text = addslashes(urf8_encode($text));

?>



But then I had to find if some word exists in stored text. This is the place I got stuck. Simple preg_match would not find text since hebrew doesnt work that easy. I've tried with /u and who kows what else.



Solution was somewhat logical and simple... 

<?php

$db_text = bin2hex(stripslashes(utf8_decode($db_text)));

$word = bin2hex($word);



$found = preg_match_all("/($word)+/i", $db_text, $matches);

?>



I've used preg_match_all since it returns number of occurences. So I could sort search results acording to that.



Hope someone finds this useful!

2008-10-15 05:56:15

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Feb 21

Автор: royaltm75 at NOSPAM dot gmail dot com


The power of pregs is limited only by your *imagination* :)

I wrote this html2a() function using preg recursive match (?R) which provides quite safe and bulletproof html/xml extraction:

<?php

function html2a ( $html ) {

  if ( !preg_match_all( '

@

\<\s*?(\w+)((?:\b(?:\'[^\']*\'|"[^"]*"|[^\>])*)?)\>

((?:(?>[^\<]*)|(?R))*)

\<\/\s*?\\1(?:\b[^\>]*)?\>

|\<\s*(\w+)(\b(?:\'[^\']*\'|"[^"]*"|[^\>])*)?\/?\>

@uxis', $html = trim($html), $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER) )

    return $html;

  $i = 0;

  $ret = array();

  foreach ($m as $set) {

    if ( strlen( $val = trim( substr($html, $i, $set[0][1] - $i) ) ) )

      $ret[] = $val;

    $val = $set[1][1] < 0 

      ? array( 'tag' => strtolower($set[4][0]) )

      : array( 'tag' => strtolower($set[1][0]), 'val' => html2a($set[3][0]) );

    if ( preg_match_all( '

/(\w+)\s*(?:=\s*(?:"([^"]*)"|\'([^\']*)\'|(\w+)))?/usix

', isset($set[5]) && $set[2][1] < 0

  ? $set[5][0]

  : $set[2][0]

  ,$attrs, PREG_SET_ORDER ) ) {

      foreach ($attrs as $a) {

        $val['attr'][$a[1]]=$a[count($a)-1];

      }

    }

    $ret[] = $val;

    $i = $set[0][1]+strlen( $set[0][0] );

  }

  $l = strlen($html);

  if ( $i < $l )

    if ( strlen( $val = trim( substr( $html, $i, $l - $i ) ) ) )

      $ret[] = $val;

  return $ret;

}

?>



Now let's try it with this example: (there are some really nasty xhtml compliant bugs, but ... we shouldn't worry)



<?php

$html = <<<EOT

some leftover text...

     < DIV class=noCompliant style = "text-align:left;" >

... and some other ...

< dIv > < empty>  </ empty>

  <p> This is yet another text <br  >

     that wasn't <b>compliant</b> too... <br   />

     </p>

 <div class="noClass" > this one is better but we don't care anyway </div ><P>

    <input   type= "text"  name ='my "name' value  = "nothin really." readonly>

end of paragraph </p> </Div>   </div>   some trailing text 

EOT;



$a = html2a($html);

//now we will make some neat html out of it

echo a2html($a);



function a2html ( $a, $in = "" ) {

  if ( is_array($a) ) {

    $s = "";

    foreach ($a as $t)

      if ( is_array($t) ) {

        $attrs=""; 

        if ( isset($t['attr']) )

          foreach( $t['attr'] as $k => $v )

            $attrs.=" ${k}=".( strpos( $v, '"' )!==false ? "'$v'" : "\"$v\"" );

        $s.= $in."<".$t['tag'].$attrs.( isset( $t['val'] ) ? ">\n".a2html( $t['val'], $in."  " ).$in."</".$t['tag'] : "/" ).">\n";

      } else

        $s.= $in.$t."\n";

  } else {

    $s = empty($a) ? "" : $in.$a."\n";

  }

  return $s;

}

?>

This produces:

some leftover text...

<div class="noCompliant" style="text-align:left;">

  ... and some other ...

  <div>

    <empty>

    </empty>

    <p>

      This is yet another text

      <br/>

      that wasn't

      <b>

        compliant

      </b>

      too...

      <br/>

    </p>

    <div class="noClass">

      this one is better but we don't care anyway

    </div>

    <p>

      <input type="text" name='my "name' value="nothin really." readonly="readonly"/>

      end of paragraph

    </p>

  </div>

</div>

some trailing text

2009-02-21 04:55:15

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Apr 01

Автор: ad


i have made up a simple function to extract a number from a string..



I am not sure how good it is, but it works.



It gets only the numbers 0-9, the "-", " ", "(", ")", "."



characters.. This is as far as I know the most widely used characters for a Phone number.



<?php

function clean_phone_number($phone) {

       if (!empty($phone)) {

               //var_dump($phone);

               preg_match_all('/[0-9\(\)+.\- ]/s', $phone, $cleaned);

               foreach($cleaned[0] as $k=>$v) {

                       $ready .= $v;

               }

               var_dump($ready);

               die;

               if (mb_strlen($cleaned) > 4 && mb_strlen($cleaned) <=25) {

                       return $cleaned;

               }

               else {

                       return false;

               }

       }

       return false;

}

?>

2009-04-01 08:18:13

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jul 18

Автор: elyknosrac at gmail dot com


Using preg_match_all I made a pretty handy function.



<?php



function reg_smart_replace($pattern, $replacement, $subject, $replacementChar = "$$$", $limit = -1)

{

    if (! $pattern || ! $subject || ! $replacement ) { return false; }

    

    $replacementChar = preg_quote($replacementChar);

    

    preg_match_all ( $pattern, $subject, $matches);

    

    if ($limit > -1) {

        foreach ($matches as $count => $value )

        {

            if ($count + 1 > $limit ) { unset($matches[$count]); }

        }

    }

    foreach ($matches[0] as $match) {

        $rep = ereg_replace($replacementChar, $match, $replacement);

        $subject = ereg_replace($match, $rep, $subject);

    }

    

    return $subject;

}

?>



This function can turn blocks of text into clickable links or whatever.  Example:



<?php

reg_smart_replace(EMAIL_REGEX, '<a href="mailto:$$$">$$$</a>', $description)

?>

will turn all email addresses into actual links.



Just substitute $$$ with the text that will be found by the regex.  If you can't use $$$ then use the 4th parameter $replacementChar

2009-07-18 18:51:12

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Sep 13

Автор: royaltm75 at gmail dot com


I have received complains, that my html2a() code (see below) doesn't work in some cases. 

It is however not the problem with algorithm or procedure, but with PCRE recursive stack limits.



If you use recursive PCRE (?R) you should remember to increase those two ini settings:



ini_set('pcre.backtrack_limit', 10000000);

ini_set('pcre.recursion_limit', 10000000);



But be warned: (from php.ini)



;Please note that if you set this value to a high number you may consume all

;the available process stack and eventually crash PHP (due to reaching the

;stack size limit imposed by the Operating System).



I have written this example mainly to demonstrate the power of PCRE LANGUAGE, not the power of it's implementation  :) 



But if you like it, use it, of course on your own risk.

2009-09-13 17:44:00

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Sep 23

Автор: avengis at gmail dot com


The next function works with almost any complex xml/xhtml string



<?php

/**

* Find and close unclosed xml tags

**/

function close_tags($text) {

    $patt_open    = "%((?<!</)(?<=<)[\s]*[^/!>\s]+(?=>|[\s]+[^>]*[^/]>)(?!/>))%";

    $patt_close    = "%((?<=</)([^>]+)(?=>))%";

    if (preg_match_all($patt_open,$text,$matches))

    {

        $m_open = $matches[1];

        if(!empty($m_open))

        {

            preg_match_all($patt_close,$text,$matches2);

            $m_close = $matches2[1];

            if (count($m_open) > count($m_close))

            {

                $m_open = array_reverse($m_open);

                foreach ($m_close as $tag) $c_tags[$tag]++;

                foreach ($m_open as $k => $tag)    if ($c_tags[$tag]--<=0) $text.='</'.$tag.'>';

            }

        }

    }

    return $text;

}

?>

2009-09-23 05:25:32

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Sep 08

Автор: no at bo dot dy


For parsing queries with entities use:



<?php

preg_match_all("/(?:^|(?<=\&(?![a-z]+\;)))([^\=]+)=(.*?)(?:$|\&(?![a-z]+\;))/i",

  $s, $m, PREG_SET_ORDER );

?>

2010-09-08 14:23:02

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Dec 06

Автор: buuh


if you want to extract all {token}s from a string:



<?php

$pattern = "/{[^}]*}/";

$subject = "{token1} foo {token2} bar";

preg_match_all($pattern, $subject, $matches);

print_r($matches);

?>



output:



Array

(

    [0] => Array

        (

            [0] => {token1}

            [1] => {token2}

        )



)

2010-12-06 04:03:08

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Feb 18

Автор: john at mccarthy dot net


I needed a function to rotate the results of a preg_match_all query, and made this. Not sure if it exists.



<?php

function turn_array($m)

{

    for ($z = 0;$z < count($m);$z++)

    {

        for ($x = 0;$x < count($m[$z]);$x++)

        {

            $rt[$x][$z] = $m[$z][$x];

        }

    }    

    

    return $rt;

}

?>



Example - Take results of some preg_match_all query:



Array

(

    [0] => Array

        (

            [1] => Banff 

            [2] => Canmore

            [3] => Invermere

        )

 

    [1] => Array

        (

            [1] => AB 

            [2] => AB

            [3] => BC

        )

 

    [2] => Array

        (

            [1] => 51.1746254 

            [2] => 51.0938416

            [3] => 50.5065193

        )

 

    [3] => Array

        (

            [1] => -115.5719757 

            [2] => -115.3517761

            [3] => -116.0321884

        )

 

    [4] => Array

        (

            [1] => T1L 1B3 

            [2] => T1W 1N2

            [3] => V0B 2G0

        )



)



Rotate it 90 degrees to group results as records:



Array

(

    [0] => Array

        (

            [1] => Banff 

            [2] => AB

            [3] => 51.1746254

            [4] => -115.5719757

            [5] => T1L 1B3

        )

 

    [1] => Array

        (

            [1] => Canmore

            [2] => AB

            [3] => 51.0938416

            [4] => -115.3517761

            [5] => T1W 1N2

        )

 

    [2] => Array

        (

            [1] => Invermere

            [2] => BC

            [3] => 50.5065193

            [4] => -116.0321884

            [5] => V0B 2G0

        )

)

2011-02-18 13:21:42

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Sep 20

Автор: satyavvd at ymail dot com


Extract fields out of csv string : ( since before php5.3 you can't use str_getcsv function ) 

Here is the regex :



<?php



$csvData = <<<EOF

10,'20',"30","'40","'50'","\"60","70,80","09\\/18,/\"2011",'a,sdfcd'

EOF



$reg = <<<EOF

/

    (

        (

            ([\'\"])

            (

               ( 

                [^\'\"]

                |

                (\\\\.)

               )*

            )

            (\\3)

            |

            (

                [^,]

                |

                (\\\\.)

            )*

    ),)

    /x

EOF;



preg_match_all($reg,$csvData,$matches);



// to extract csv fields

print_r($matches[2]);

?>

2011-09-20 07:08:11

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

May 04

Автор: marc


Better use preg_replace to convert text in a clickable link with tag <a> 



$html = preg_replace('"\b(http://\S+)"', '<a href="$1">$1</a>', $text);

2012-05-04 18:48:06

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Sep 22

Автор: fseverin at free dot fr


As I intended to create for my own purpose a clean PHP class to act on XML files, combining the use of DOM and simplexml functions, I had that small problem, but very annoying, that the offsets in a path is not numbered the same in both. 



That is to say, for example, if i get a DOM xpath object it appears like:

/ANODE/ANOTHERNODE/SOMENODE[9]/NODE[2]

and as a simplexml object would be equivalent to:

ANODE->ANOTHERNODE->SOMENODE[8]->NODE[1]



So u see what I mean? I used preg_match_all to solve that problem, and finally I got this after some hours of headlock (as I'm french the names of variables are in French sorry), hoping it could be useful to some of you:



<?php

function decrease_string($string)

    {

        /* retrieve all occurrences AND offsets of numbers in the original string: */



        preg_match_all("/[0-9]+/",$chaine,$out,PREG_OFFSET_CAPTURE);

            for($i=0;$i<sizeof($out[0]);$i++)

            {

                $longueurnombre = strlen((string)$out[0][$i][0]);

                $taillechaine = strlen($chaine);

                // cut the string in 3 pieces

                $debut = substr($chaine,0,$out[0][$i][1]);

                $milieu = ($out[0][$i][0])-1;

                $fin = substr($chaine,$out[0][$i][1]+$longueurnombre,$taillechaine);

                 /* if it's 10,100,1000, the problem is that the string gets shorter and it shifts all the offsets, so we have to decrease them of 1 */

                 if(preg_match('#[1][0]+$#', $out[0][$i][0]))

                 {

                    for($j = $i+1;$j<sizeof($out[0]);$j++)

                    {

                        $out[0][$j][1] = $out[0][$j][1] -1;

                    }

                 }

                $chaine = $debut.$milieu.$fin;

            }

        return $chaine;

    }

?>

2012-09-22 20:52:18

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Sep 24

Автор: fab


Here is a function that replaces all occurrences of a number in a string by the number--



<?php

function decremente_chaine($chaine)

    {

        //récupérer toutes les occurrences de nombres et leurs indices

        preg_match_all("/[0-9]+/",$chaine,$out,PREG_OFFSET_CAPTURE);

            //parcourir les occurrences 

            for($i=0;$i<sizeof($out[0]);$i++)

            {

                $longueurnombre = strlen((string)$out[0][$i][0]);

                $taillechaine = strlen($chaine);

                // découper la chaine en 3 morceaux

                $debut = substr($chaine,0,$out[0][$i][1]);

                $milieu = ($out[0][$i][0])-1;

                $fin = substr($chaine,$out[0][$i][1]+$longueurnombre,$taillechaine);

                 // si c'est 10,100,1000 etc. on décale tout de 1 car le résultat comporte un chiffre de moins

                 if(preg_match('#[1][0]+$#', $out[0][$i][0]))

                 {

                    for($j = $i+1;$j<sizeof($out[0]);$j++)

                    {

                        $out[0][$j][1] = $out[0][$j][1] -1;

                    }

                 }

                $chaine = $debut.$milieu.$fin;

            }

        return $chaine;

    }

?>

2012-09-24 12:14:45

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jan 22

Автор: ajeet dot nigam at icfaitechweb dot com


Here http://tryphpregex.com/ is a  php based online regex editor which helps you test your regular expressions with real-time highlighting of regex match on data input.

2014-01-22 20:34:25

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Apr 15

Автор: DarkSide


This is very useful to combine matches:

$a = array_combine($matches[1], $matches[2]);

2014-04-15 17:03:35

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

May 17

Автор: Daniel Klein


The code that john at mccarthy dot net posted is not necessary. If you want your results grouped by individual match simply use:



<?

preg_match_all($pattern, $string, $matches, PREG_SET_ORDER);

?>



E.g.



<?

preg_match_all('/([GH])([12])([!?])/', 'G1? H2!', $matches); // Default PREG_PATTERN_ORDER

// $matches = array(0 => array(0 => 'G1?', 1 => 'H2!'),

//                  1 => array(0 => 'G', 1 => 'H'),

//                  2 => array(0 => '1', 1 => '2'),

//                  3 => array(0 => '?', 1 => '!'))



preg_match_all('/([GH])([12])([!?])/', 'G1? H2!', $matches, PREG_SET_ORDER);

// $matches = array(0 => array(0 => 'G1?', 1 => 'G', 2 => '1', 3 => '?'),

//                  1 => array(0 => 'H2!', 1 => 'H', 2 => '2', 3 => '!'))

?>

2015-05-17 08:26:58

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

May 26

Автор: vojjov dot artem at ya dot ru


// Here is function that allows you to preg_match_all array of patters



function getMatches($pattern, $subject) {

    $matches = array();



    if (is_array($pattern)) {

        foreach ($pattern as $p) {

            $m = getMatches($p, $subject);



            foreach ($m as $key => $match) {

                if (isset($matches[$key])) {

                    $matches[$key] = array_merge($matches[$key], $m[$key]);    

                } else {

                    $matches[$key] = $m[$key];

                }

            }

        }

    } else {

        preg_match_all($pattern, $subject, $matches);

    }



    return $matches;

}



$patterns = array(

    '/<span>(.*?)<\/span>/',

    '/<a href=".*?">(.*?)<\/a>/'

);



$html = '<span>some text</span>';

$html .= '<span>some text in another span</span>';

$html .= '<a href="path/">here is the link</a>';

$html .= '<address>address is here</address>';

$html .= '<span>here is one more span</span>';



$matches = getMatches($patterns, $html);



print_r($matches); // result is below



/*

Array

(

    [0] => Array

        (

            [0] => <span>some text</span>

            [1] => <span>some text in another span</span>

            [2] => <span>here is one more span</span>

            [3] => <a href="path/">here is the link</a>

        )



    [1] => Array

        (

            [0] => some text

            [1] => some text in another span

            [2] => here is one more span

            [3] => here is the link

        )



)

*/

2015-05-26 18:40:10

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Nov 09

Автор: stas kuryan aka stafox


Here is a awesome online regex editor https://regex101.com/

which helps you test your regular expressions (prce, js, python) with real-time highlighting of regex match on data input.

2015-11-09 10:57:49

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Nov 19

Автор: matt at lvl99 dot com


I had been crafting and testing some regexp patterns online using the tools Regex101 and a `preg_match_all()` tester and found that the regexp patterns I wrote worked fine on them, just not in my code.



My problem was not double-escaping backslash characters:



<?php

// Input test

$input = "\"something\",\"something here\",\"some\nnew\nlines\",\"this is the end\"";



// Work with online regexp testers, doesn't work in PHP

preg_match_all( "/(?:,|^)(?<!\\)\".*?(?<!\\)\"(?:(?=,)|$)/s", $input, $matches );



/*

Outputs: NULL

*/



// Works with online regexp testers, does work in PHP

preg_match_all( "/(?:,|^)(?<!\\\\)\".*?(?<!\\\\)\"(?:(?=,)|$)/s", $input, $matches );



/*

Outputs:

array(2) {

  [0]=>

  array(4) {

    [0]=>

    string(11) ""something""

    [1]=>

    string(17) ","something here""

    [2]=>

    string(17) ","some

new

lines""

    [3]=>

    string(18) ","this is the end""

  }

  [1]=>

  array(4) {

    [0]=>

    string(9) "something"

    [1]=>

    string(14) "something here"

    [2]=>

    string(14) "some

new

lines"

    [3]=>

    string(15) "this is the end"

  }

}

*/

?>

2015-11-19 16:30:58

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Feb 09

Автор: stamster at gmail dot com


Be careful with this pattern match and large input buffer on preg_match_* functions.



<?php

$pattern = '/\{(?:[^{}]|(?R))*\}/';



preg_match_all($pattern, $buffer, $matches); 

?>



if $buffer is 80+ KB in size, you'll end up with segfault! 



[89396.588854] php[4384]: segfault at 7ffd6e2bdeb0 ip 00007fa20c8d67ed sp 00007ffd6e2bde70 error 6 in libpcre.so.3.13.1[7fa20c8c3000+3c000]



This is due to the PCRE recursion. This is a known bug in PHP since 2008, but it's source is not PHP itself but PCRE library. 



Rasmus Lerdorf has the answer: https://bugs.php.net/bug.php?id=45735#1365812629



"The problem here is that there is no way to detect run-away regular expressions 

here without huge performance and memory penalties. Yes, we could build PCRE in a 

way that it wouldn't segfault and we could crank up the default backtrack limit 

to something huge, but it would slow every regex call down by a lot. If PCRE 

provided a way to handle this in a more graceful manner without the performance 

hit we would of course use it."

2016-02-09 13:55:58

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Apr 19

Автор: qdinar at gmail dot com


when regex is for longer and shorter version of a string,

only one of that long and short versions is catched.

when regex match occurs in one position of string,

only one match is saved in matches[0] for that position.

if ? is used, regex is greedy, and catches more long version,

if | is used, most first matching variant is catched:

<?php

preg_match_all('/ab|abc/','abc',$m);

var_dump($m);

preg_match_all('/abc?/','abc',$m);

var_dump($m);

?>

['ab', 'abc'] in $m[0] for both can be expected, but it is not so,

actually they output [['ab']] and [['abc']]:

array(1) {

  [0]=>

  array(1) {

    [0]=>

    string(2) "ab"

  }

}

array(1) {

  [0]=>

  array(1) {

    [0]=>

    string(3) "abc"

  }

}

2018-04-19 22:28:48

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Dec 28

Автор: chris at ocproducts dot com


If PREG_OFFSET_CAPTURE is set then unmatched captures (i.e. ones with '?') will not be present in the result array. This is presumably because there is no offset, and thus the original PHP dev decided best to just leave it out.

2020-12-28 17:43:16

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jun 29

Автор: mojo


Why <?php preg_match_all('/(?:^|\s)(ABC|XYZ)(?:\s|$)/i', 'ABC  XYZ', $match) ?> finds only 'ABC'?



Because the first full match is 'ABC ' - containing the trailing space. And that space is not available for further processing.



Use lookbehind and lookahead to solve this problem: <?php preg_match_all('/(?<=^|\s)(ABC|XYZ)(?=\s|$)/i', 'ABC XYZ', $match)  ?>

2021-06-29 13:47:23

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Aug 28

Автор: harrybarrow at mail dot ru


preg_match_all() and other preg_*() functions doesn't work well with very long strings, at least longer that 1Mb.

In this case case function returns FALSE and $matchers value is unpredictable, may contain some values, may be empty.

In this case workaround is pre-split long string onto parts, for instance explode() long string by some criteria and then apply preg_match_all() on each part.

Typical scenario for this case is log analysis by regular expressions.

Tested on PHP 7.2.0

2021-08-28 02:50:49

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jan 04

Автор: rajudec at gmail dot com


<?php

//Allow limited span formatting in html text



$str='<span style="text-decoration-line: underline; font-weight: bold; font-style: italic;">White</span>

<span style="text-decoration-line: underline;">RED</span><span style="color:blue">blue</span>';



function next_format($str)

{

     $array=array("text-decoration-line"=>"underline","font-weight"=>"bold","font-style"=>"italic");

    foreach ($array as $key=>$val)

    {

          if($str[1]==$key && $str[2]==$val)

        {

              return $str[1].': '.$str[2].";";

        }

     }

          return '';

  

}

function next_span($matches)

{

  $needFormat=preg_replace_callback('/([a-z\-]+):\s*([^;]+)(;|)/ism',"next_format",$matches[2]);

  return $matches[1].$needFormat.$matches[3];

  

}

 echo preg_replace_callback(

            "/(\<span\s+style\=\")([^\"]+)(\">)/ism",

            "next_span",

            $str);

?>

2022-01-04 10:45:00

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Aug 06

Автор: biziclop at vipmail dot hu


Sometimes you don't just want to cherry-pick the matches but need that the entire subject is made up from matching substrings, so every character of the subject is a member of a match. None of the existing preg_* function is easily applicable for this task, so I made the preg_match_entire() function.

It uses the (*MARK) syntax which is documented here: https://pcre.org/original/doc/html/pcrepattern.html#SEC27



<?php 



// returns: the array of matches

// null if the string is not a repetition of the pattern

// false on error

function preg_match_entire( string $pattern, string $subject, int $flags = 0 ){

  // Rebuild and wrap the pattern

  $delimiter = $pattern[0];

  $ldp       = strrpos( $pattern, $delimiter );

  $pattern   = substr( $pattern, 1, $ldp - 1 );

  $modifiers = substr( $pattern,    $ldp + 1 );

  $pattern   = "{$delimiter}   \G\z (*MARK:END)   |   \G (?:{$pattern})   {$delimiter}x{$modifiers}";

  $r = preg_match_all( $pattern, $subject, $m, PREG_SET_ORDER | $flags );

  if( $r === false )  return false;  // error

  $end = array_pop( $m );

  if( $end === null || ! isset( $end['MARK']) || $end['MARK'] !== 'END')

    return null;  // end of string not reached

  return $m;  // return actual matches, may be an empty array

}



// Same results:

test('#{\d+}#', '');              // []

test('#{\d+}#', '{11}{22}{33}');  // {11},{22},{33}



// Different results: preg_match_entire won't match this:

test('#{\d+}#', '{11}{}{aa}{22},{{33}}');

// preg_match_entire: null

// preg_match_all:    {11},{22},{33}



function test( $pattern, $subject ){

  echo "pattern:           $pattern\n";

  echo "subject:           $subject\n";

  print_matches('preg_match_entire: ', preg_match_entire( $pattern, $subject ));

  preg_match_all( $pattern, $subject, $matches, PREG_SET_ORDER );

  print_matches('preg_match_all:    ', $matches );

  echo "\n";

}

function print_matches( $t, $m ){

  echo $t, is_array( $m ) && $m ? implode(',', array_column( $m, 0 )) : json_encode( $m ), "\n";

} ?>

2022-08-06 22:22:12

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Jun 07

Автор: loretoparisi at gmail dot com


A multi-byte safe preg_match_all that fixes capture offsets when using PREG_OFFSET_CAPTURE on utf-8 strings

 

<?php 

function mb_preg_match_all($pattern, $subject, &$matches = null, $flags = 0, $offset = 0) {

    $out=preg_match_all($pattern, $subject, $matches, $flags, $offset);

    if($flags & PREG_OFFSET_CAPTURE && is_array($matches) && count($matches)>0) {

        foreach ($matches[0] as &$match) {

            $match[1] = mb_strlen(substr($subject, 0, $match[1]));

        }

    }

    return $out;

}

?>

2023-06-07 15:28:00

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

Nov 15

Автор: b3forgames at gmail dot com


EXAMPLE:

$file = file_get_contents('file');

if(preg_match_all('#Task To Run(.*)#s', $file, $m)) {

var_dump($m);

}



No output...



preg_match_all  not work if file exist BOM bytes (FF FE) :



╰─$ head -n1 file | hexdump -C

00000000  ff fe 48 00 6f 00 73 00  74 00 4e 00 61 00 6d 00  |..H.o.s.t.N.a.m.|



clear BOM via dos2unix:



╰─$ dos2unix file 

dos2unix: converting UTF-16LE file file to UTF-8 Unix format...



Check again:



╰─$ head -n1 file | hexdump -C

00000000  48 6f 73 74 4e 61 6d 65  3a 20 20 20 20 20 20 20  |HostName:       |



Great! Now preg_match_all works fine.

2023-11-15 02:18:34

http://php5.kiev.ua/manual/ru/function.preg-match-all.html

PHP5

Для web разработчика

Apr 24
Функция preg_match_all() - Perform a global regular expression match

preg_match_all

Description

Parameters

Return Values

Changelog

Examples

See Also

Коментарии

PHP5

Для web разработчика

Apr 24Функция preg_match_all() - Perform a global regular expression match

preg_match_all

Description

Parameters

Return Values

Changelog

Examples

See Also

Коментарии

Apr 24
Функция preg_match_all() - Perform a global regular expression match