token_get_all

(PHP 4 >= 4.2.0, PHP 5)

token_get_all — Разбивает переданный исходный код на PHP-лексемы

Описание

array token_get_all ( string $source )

Функция token_get_all() разбирает переданную строку source в языковые лексемы PHP используя лексический сканер Zend Engine.

Список лексем смотрите в Список меток (tokens) парсера, или используйте token_name() для перевода значения лексемы в строковое представление.

Список параметров

source: Исходный код PHP для разбора.

Возвращаемые значения

Массив идентификаторов лексем. Каждый индивидуальный идентификатор лексемы это или одиночный символ (например, ;, ., >, !, другие...), или трехэлементный массив, содержащий индекс лексемы в нулевом элементе, строку с оригинальным содежимым лексемы в первом элементе и номером строки во втором элементе.

Примеры

Пример #1 Пример использования token_get_all()


<?php
$tokens = token_get_all('<?php echo; ?>'); /* => array(
                                                  array(T_OPEN_TAG, '<?php'), 
                                                  array(T_ECHO, 'echo'),
                                                  ';',
                                                  array(T_CLOSE_TAG, '?>') ); */

/* Обратите внимание, в приведенном примере строка разбирается как T_INLINE_HTML
   вместо ожидаемого T_COMMENT (T_ML_COMMENT в PHP <5).
   Это происходит потому, что не используется открывающего/закрывающего тегов в "коде".
   Это будет эквиалентно помещению комментариев вне тегов <?php ?> в нормальном файле. */
$tokens = token_get_all('/* comment */'); // => array(array(T_INLINE_HTML, '/* comment */'));
?>

Список изменений

Версия	Описание
5.2.2	Номера строк возвращаются в элементе 2.

Коментарии

Dec 03

Автор: nicolas dot grekas+php at gmail dot com


Well, there is a way to parse for errors. See

http://www.php.net/manual/function.php-check-syntax.php#77318

2007-12-03 03:10:25

http://php5.kiev.ua/manual/ru/function.token-get-all.html

Jun 29

Автор: Dennis Robinson from basnetworks dot net


I wanted to use the tokenizer functions to count source lines of code, including counting comments.  Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations.  The token_get_all() function makes this task easy by detecting all the comments properly.  However, it does not tokenize newline characters.  I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.



<?php



define('T_NEW_LINE', -1);



function token_get_all_nl($source)

{

    $new_tokens = array();



    // Get the tokens

    $tokens = token_get_all($source);



    // Split newlines into their own tokens

    foreach ($tokens as $token)

    {

        $token_name = is_array($token) ? $token[0] : null;

        $token_data = is_array($token) ? $token[1] : $token;



        // Do not split encapsed strings or multiline comments

        if ($token_name == T_CONSTANT_ENCAPSED_STRING || substr($token_data, 0, 2) == '/*')

        {

            $new_tokens[] = array($token_name, $token_data);

            continue;

        }



        // Split the data up by newlines

        $split_data = preg_split('#(\r\n|\n)#', $token_data, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);



        foreach ($split_data as $data)

        {

            if ($data == "\r\n" || $data == "\n")

            {

                // This is a new line token

                $new_tokens[] = array(T_NEW_LINE, $data);

            }

            else

            {

                // Add the token under the original token name

                $new_tokens[] = is_array($token) ? array($token_name, $data) : $data;

            }

        }

    }



    return $new_tokens;

}



function token_name_nl($token)

{

    if ($token === T_NEW_LINE)

    {

        return 'T_NEW_LINE';

    }



    return token_name($token);

}



?>



Example usage:



<?php



$tokens = token_get_all_nl(file_get_contents('somecode.php'));



foreach ($tokens as $token)

{

    if (is_array($token))

    {

        echo (token_name_nl($token[0]) . ': "' . $token[1] . '"<br />');

    }

    else

    {

        echo ('"' . $token . '"<br />');

    }

}



?>



I'm sure you can figure out how to count the lines of code, and lines of comments with these functions.  This was a huge improvement on my previous attempt at counting lines of code with regular expressions.  I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.

2009-06-29 00:24:40

http://php5.kiev.ua/manual/ru/function.token-get-all.html

Aug 02

Автор: gomodo at free dot fr


Yes, some problems (On WAMP, PHP 5.3.0 ) with get_token_all() 



1 : bug line numbers

 Since PHP 5.2.2 token_get_all()  should return Line numbers in element 2..

.. but for instance (5.3.0 on WAMP), it work perfectly only with PHP code (not HMTL miwed), but if you have some T_INLINE_HTML detected by token_get_all() ,  sometimes you find wrongs line numbers  (return next line)... :(



2: bug warning message can impact loops

Warning with php code uncompleted (ex : php code line by line) :

for example if a comment tag is not closed  token_get_all()  can block loops on this  warning :

Warning: Unterminated comment starting line



This problem seem not occur in CLI mod (php command line), but only in web mod.



Waiting more stability, used token_get_all()  only on PHP code (not HMTL miwed) :

First extract entirely PHP code (with open et close php tag), 

Second use token_get_all()  on the pure PHP code.



3 : Why there not function to extract PHP code (to extract HTML, we have Tidy..)?



Waiting, I used a function :



The code at end this post :

http://www.developpez.net/forums/d786381/php/langage/

fonctions/analyser-fichier-php-token_get_all/



This function not support :

- Old notation :  "<?  ?>" and "<% %>"

- heredoc syntax 

- nowdoc syntax (since PHP 5.3.0)

2009-08-02 13:08:03

http://php5.kiev.ua/manual/ru/function.token-get-all.html

Apr 03

Автор: Theriault


The T_OPEN_TAG token will include the first trailing newline (\r, \n, or \r\n), tab (\t), or space. Any additional space after this token will be in a T_WHITESPACE token.



The T_CLOSE_TAG token will include the first trailing newline (\r, \n, or \r\n; as described here language.basic-syntax.instruction-separation). Any additional space after this token will be in a T_INLINE_HTML token.

2016-04-03 02:23:43

http://php5.kiev.ua/manual/ru/function.token-get-all.html

May 23

Автор: bart


Not all tokens are returned as an array. The rule appears to be that if a token is not variable, but instead it is one particular constant string, it is returned as a string instead. You don't get a line number. This is the case for braces( "{", "}"), parentheses ("(", ")"), brackets ("[", "]"), comma (","), semi-colon (";"), and a whole slew of operator signs ("!", "=", "+", "*", "/", ".", "+=", ...).

2017-05-23 11:34:44

http://php5.kiev.ua/manual/ru/function.token-get-all.html

Aug 25

Автор: Ivan Ustanin


As a caution: when using TOKEN_PARSE with an invalid php-file, one can get an error like this:

Parse error: syntax error, unexpected '__construct' (T_STRING), expecting function (T_FUNCTION) or const (T_CONST) in  on line 15

Notice the missing filename as this function accepts a string, not a filename and thus has no idea of the latter.

However an exception would be more appreciated.

2018-08-25 20:40:25

http://php5.kiev.ua/manual/ru/function.token-get-all.html

Функции PHP-лексера (tokenizer)

token_name

Функции PHP-лексера (tokenizer)

PHP Manual

PHP5

Для web разработчика

Jun 24
Функция token_get_all() - Разбивает переданный исходный код на PHP-лексемы

token_get_all

Описание

Список параметров

Возвращаемые значения

Примеры

Список изменений

Коментарии

PHP5

Для web разработчика

Jun 24Функция token_get_all() - Разбивает переданный исходный код на PHP-лексемы

token_get_all

Описание

Список параметров

Возвращаемые значения

Примеры

Список изменений

Коментарии

Jun 24
Функция token_get_all() - Разбивает переданный исходный код на PHP-лексемы