token_get_all

(PHP 4 >= 4.2.0, PHP 5)

token_get_allSplit given source into PHP tokens

Description

array token_get_all ( string $source )

token_get_all() parses the given source string into PHP language tokens using the Zend engine's lexical scanner.

For a list of parser tokens, see List of Parser Tokens, or use token_name() to translate a token value into its string representation.

Parameters

source

The PHP source to parse.

Return Values

An array of token identifiers. Each individual token identifier is either a single character (i.e.: ;, ., >, !, etc...), or a three element array containing the token index in element 0, the string content of the original token in element 1 and the line number in element 2.

Examples

Example #1 token_get_all() examples

<?php
$tokens 
token_get_all('<?php echo; ?>'); /* => array(
                                                  array(T_OPEN_TAG, '<?php'), 
                                                  array(T_ECHO, 'echo'),
                                                  ';',
                                                  array(T_CLOSE_TAG, '?>') ); */

/* Note in the following example that the string is parsed as T_INLINE_HTML
   rather than the otherwise expected T_COMMENT (T_ML_COMMENT in PHP <5).
   This is because no open/close tags were used in the "code" provided.
   This would be equivalent to putting a comment outside of <?php ?> tags in a normal file. */
$tokens token_get_all('/* comment */'); // => array(array(T_INLINE_HTML, '/* comment */'));
?>

Changelog

Version Description
5.2.2 Line numbers are returned in element 2

Коментарии

Well, there is a way to parse for errors. See
http://www.php.net/manual/function.php-check-syntax.php#77318
2007-12-03 03:10:25
http://php5.kiev.ua/manual/ru/function.token-get-all.html
I wanted to use the tokenizer functions to count source lines of code, including counting comments.  Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations.  The token_get_all() function makes this task easy by detecting all the comments properly.  However, it does not tokenize newline characters.  I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.

<?php

define
('T_NEW_LINE', -1);

function 
token_get_all_nl($source)
{
   
$new_tokens = array();

   
// Get the tokens
   
$tokens token_get_all($source);

   
// Split newlines into their own tokens
   
foreach ($tokens as $token)
    {
       
$token_name is_array($token) ? $token[0] : null;
       
$token_data is_array($token) ? $token[1] : $token;

       
// Do not split encapsed strings or multiline comments
       
if ($token_name == T_CONSTANT_ENCAPSED_STRING || substr($token_data02) == '/*')
        {
           
$new_tokens[] = array($token_name$token_data);
            continue;
        }

       
// Split the data up by newlines
       
$split_data preg_split('#(\r\n|\n)#'$token_data, -1PREG_SPLIT_DELIM_CAPTURE PREG_SPLIT_NO_EMPTY);

        foreach (
$split_data as $data)
        {
            if (
$data == "\r\n" || $data == "\n")
            {
               
// This is a new line token
               
$new_tokens[] = array(T_NEW_LINE$data);
            }
            else
            {
               
// Add the token under the original token name
               
$new_tokens[] = is_array($token) ? array($token_name$data) : $data;
            }
        }
    }

    return 
$new_tokens;
}

function 
token_name_nl($token)
{
    if (
$token === T_NEW_LINE)
    {
        return 
'T_NEW_LINE';
    }

    return 
token_name($token);
}

?>

Example usage:

<?php

$tokens 
token_get_all_nl(file_get_contents('somecode.php'));

foreach (
$tokens as $token)
{
    if (
is_array($token))
    {
        echo (
token_name_nl($token[0]) . ': "' $token[1] . '"<br />');
    }
    else
    {
        echo (
'"' $token '"<br />');
    }
}

?>

I'm sure you can figure out how to count the lines of code, and lines of comments with these functions.  This was a huge improvement on my previous attempt at counting lines of code with regular expressions.  I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.
2009-06-29 00:24:40
http://php5.kiev.ua/manual/ru/function.token-get-all.html
Yes, some problems (On WAMP, PHP 5.3.0 ) with get_token_all() 

1 : bug line numbers
 Since PHP 5.2.2 token_get_all()  should return Line numbers in element 2..
.. but for instance (5.3.0 on WAMP), it work perfectly only with PHP code (not HMTL miwed), but if you have some T_INLINE_HTML detected by token_get_all() ,  sometimes you find wrongs line numbers  (return next line)... :(

2: bug warning message can impact loops
Warning with php code uncompleted (ex : php code line by line) :
for example if a comment tag is not closed  token_get_all()  can block loops on this  warning :
Warning: Unterminated comment starting line

This problem seem not occur in CLI mod (php command line), but only in web mod.

Waiting more stability, used token_get_all()  only on PHP code (not HMTL miwed) :
First extract entirely PHP code (with open et close php tag), 
Second use token_get_all()  on the pure PHP code.

3 : Why there not function to extract PHP code (to extract HTML, we have Tidy..)?

Waiting, I used a function :

The code at end this post :
http://www.developpez.net/forums/d786381/php/langage/
fonctions/analyser-fichier-php-token_get_all/

This function not support :
- Old notation :  "<?  ?>" and "<% %>"
- heredoc syntax 
- nowdoc syntax (since PHP 5.3.0)
2009-08-02 13:08:03
http://php5.kiev.ua/manual/ru/function.token-get-all.html
Автор:
The T_OPEN_TAG token will include the first trailing newline (\r, \n, or \r\n), tab (\t), or space. Any additional space after this token will be in a T_WHITESPACE token.

The T_CLOSE_TAG token will include the first trailing newline (\r, \n, or \r\n; as described here language.basic-syntax.instruction-separation). Any additional space after this token will be in a T_INLINE_HTML token.
2016-04-03 02:23:43
http://php5.kiev.ua/manual/ru/function.token-get-all.html
Автор:
Not all tokens are returned as an array. The rule appears to be that if a token is not variable, but instead it is one particular constant string, it is returned as a string instead. You don't get a line number. This is the case for braces( "{", "}"), parentheses ("(", ")"), brackets ("[", "]"), comma (","), semi-colon (";"), and a whole slew of operator signs ("!", "=", "+", "*", "/", ".", "+=", ...).
2017-05-23 11:34:44
http://php5.kiev.ua/manual/ru/function.token-get-all.html
Автор:
As a caution: when using TOKEN_PARSE with an invalid php-file, one can get an error like this:
Parse error: syntax error, unexpected '__construct' (T_STRING), expecting function (T_FUNCTION) or const (T_CONST) in  on line 15
Notice the missing filename as this function accepts a string, not a filename and thus has no idea of the latter.
However an exception would be more appreciated.
2018-08-25 20:40:25
http://php5.kiev.ua/manual/ru/function.token-get-all.html

    Поддержать сайт на родительском проекте КГБ