The XMLReader class
(PHP 5 >= 5.1.0)
Introduction
The XMLReader extension is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.
Class synopsis
Properties
- attributeCount
-
The number of attributes on the node
- baseURI
-
The base URI of the node
- depth
-
Depth of the node in the tree, starting at 0
- hasAttributes
-
Indicates if node has attributes
- hasValue
-
Indicates if node has a text value
- isDefault
-
Indicates if attribute is defaulted from DTD
- isEmptyElement
-
Indicates if node is an empty element tag
- localName
-
The local name of the node
- name
-
The qualified name of the node
- namespaceURI
-
The URI of the namespace associated with the node
- nodeType
-
The node type for the node
- prefix
-
The prefix of the namespace associated with the node
- value
-
The text value of the node
- xmlLang
-
The xml:lang scope which the node resides
Predefined Constants
XMLReader Node Types
XMLReader::NONE
-
No node type
XMLReader::ELEMENT
-
Start element
XMLReader::ATTRIBUTE
-
Attribute node
XMLReader::TEXT
-
Text node
XMLReader::CDATA
-
CDATA node
XMLReader::ENTITY_REF
-
Entity Reference node
XMLReader::ENTITY
-
Entity Declaration node
XMLReader::PI
-
Processing Instruction node
XMLReader::COMMENT
-
Comment node
XMLReader::DOC
-
Document node
XMLReader::DOC_TYPE
-
Document Type node
XMLReader::DOC_FRAGMENT
-
Document Fragment node
XMLReader::NOTATION
-
Notation node
XMLReader::WHITESPACE
-
Whitespace node
XMLReader::SIGNIFICANT_WHITESPACE
-
Significant Whitespace node
XMLReader::END_ELEMENT
-
End Element
XMLReader::END_ENTITY
-
End Entity
XMLReader::XML_DECLARATION
-
XML Declaration node
XMLReader Parser Options
XMLReader::LOADDTD
-
Load DTD but do not validate
XMLReader::DEFAULTATTRS
-
Load DTD and default attributes but do not validate
XMLReader::VALIDATE
-
Load DTD and validate while parsing
XMLReader::SUBST_ENTITIES
-
Substitute entities and expand references
Table of Contents
- XMLReader::close — Close the XMLReader input
- XMLReader::expand — Returns a copy of the current node as a DOM object
- XMLReader::getAttribute — Get the value of a named attribute
- XMLReader::getAttributeNo — Get the value of an attribute by index
- XMLReader::getAttributeNs — Get the value of an attribute by localname and URI
- XMLReader::getParserProperty — Indicates if specified property has been set
- XMLReader::isValid — Indicates if the parsed document is valid
- XMLReader::lookupNamespace — Lookup namespace for a prefix
- XMLReader::moveToAttribute — Move cursor to a named attribute
- XMLReader::moveToAttributeNo — Move cursor to an attribute by index
- XMLReader::moveToAttributeNs — Move cursor to a named attribute
- XMLReader::moveToElement — Position cursor on the parent Element of current Attribute
- XMLReader::moveToFirstAttribute — Position cursor on the first Attribute
- XMLReader::moveToNextAttribute — Position cursor on the next Attribute
- XMLReader::next — Move cursor to next node skipping all subtrees
- XMLReader::open — Set the URI containing the XML to parse
- XMLReader::read — Move to next node in document
- XMLReader::readInnerXML — Retrieve XML from current node
- XMLReader::readOuterXML — Retrieve XML from current node, including it self
- XMLReader::readString — Reads the contents of the current node as a string
- XMLReader::setParserProperty — Set parser options
- XMLReader::setRelaxNGSchema — Set the filename or URI for a RelaxNG Schema
- XMLReader::setRelaxNGSchemaSource — Set the data containing a RelaxNG Schema
- XMLReader::setSchema — Validate document against XSD
- XMLReader::XML — Set the data containing the XML to parse
Коментарии
<?php
function parseXML($node,$seq,$path) {
global $oldpath;
if (!$node->read())
return;
if ($node->nodeType != 15) {
print '<br/>'.$node->depth;
print '-'.$seq++;
print ' '.$path.'/'.($node->nodeType==3?'text() = ':$node->name);
print $node->value;
if ($node->hasAttributes) {
print ' [hasAttributes: ';
while ($node->moveToNextAttribute()) print '@'.$node->name.' = '.$node->value.' ';
print ']';
}
if ($node->nodeType == 1) {
$oldpath=$path;
$path.='/'.$node->name;
}
parseXML($node,$seq,$path);
}
else parseXML($node,$seq,$oldpath);
}
$source = "<tag1>this<tag2 id='4' name='foo'>is</tag2>a<tag2 id='5'>common</tag2>record</tag1>";
$xml = new XMLReader();
$xml->XML($source);
print htmlspecialchars($source).'<br/>';
parseXML($xml,0,'');
?>
Output:
<tag1>this<tag2 id='4' name='foo'>is</tag2>a<tag2 id='5'>common</tag2>record</tag1>
0-0 /tag1
1-1 /tag1/text() = this
1-2 /tag1/tag2 [hasAttributes: @id = 4 @name = foo ]
2-3 /tag1/text() = is
1-4 /text() = a
1-5 /tag2 [hasAttributes: @id = 5 ]
2-6 /text() = common
1-7 /text() = record
make some modify from Sergey Aikinkulov's note
<?php
function xml2assoc(&$xml){
$assoc = NULL;
$n = 0;
while($xml->read()){
if($xml->nodeType == XMLReader::END_ELEMENT) break;
if($xml->nodeType == XMLReader::ELEMENT and !$xml->isEmptyElement){
$assoc[$n]['name'] = $xml->name;
if($xml->hasAttributes) while($xml->moveToNextAttribute()) $assoc[$n]['atr'][$xml->name] = $xml->value;
$assoc[$n]['val'] = xml2assoc($xml);
$n++;
}
else if($xml->isEmptyElement){
$assoc[$n]['name'] = $xml->name;
if($xml->hasAttributes) while($xml->moveToNextAttribute()) $assoc[$n]['atr'][$xml->name] = $xml->value;
$assoc[$n]['val'] = "";
$n++;
}
else if($xml->nodeType == XMLReader::TEXT) $assoc = $xml->value;
}
return $assoc;
}
?>
add else if($xml->isEmptyElement)
may be some xml has emptyelement
Next version xml2assoc with some improve fixes:
- no doubled data
- no buffer arrays
<?php
/*
Read XML structure to associative array
--
Using:
$xml = new XMLReader();
$xml->open([XML file]);
$assoc = xml2assoc($xml);
$xml->close();
*/
function xml2assoc($xml) {
$assoc = null;
while($xml->read()){
switch ($xml->nodeType) {
case XMLReader::END_ELEMENT: return $assoc;
case XMLReader::ELEMENT:
$assoc[$xml->name][] = array('value' => $xml->isEmptyElement ? '' : xml2assoc($xml));
if($xml->hasAttributes){
$el =& $assoc[$xml->name][count($assoc[$xml->name]) - 1];
while($xml->moveToNextAttribute()) $el['attributes'][$xml->name] = $xml->value;
}
break;
case XMLReader::TEXT:
case XMLReader::CDATA: $assoc .= $xml->value;
}
}
return $assoc;
}
?>
Thanks rein_baarsma33 AT hotmail DOT com for bugfixes.
This is my new child of XML parsing method based on my and yours modification.
XML2ASSOC Is a complete solution for parsing ordinary XML
<?php
/**
* XML2Assoc Class to creating
* PHP Assoc Array from XML File
*
* @author godseth (AT) o2.pl & rein_baarsma33 (AT) hotmail.com (Bugfixes in parseXml Method)
* @uses XMLReader
*
*/
class Xml2Assoc {
/**
* Optimization Enabled / Disabled
*
* @var bool
*/
protected $bOptimize = false;
/**
* Method for loading XML Data from String
*
* @param string $sXml
* @param bool $bOptimize
*/
public function parseString( $sXml , $bOptimize = false) {
$oXml = new XMLReader();
$this -> bOptimize = (bool) $bOptimize;
try {
// Set String Containing XML data
$oXml->XML($sXml);
// Parse Xml and return result
return $this->parseXml($oXml);
} catch (Exception $e) {
echo $e->getMessage();
}
}
/**
* Method for loading Xml Data from file
*
* @param string $sXmlFilePath
* @param bool $bOptimize
*/
public function parseFile( $sXmlFilePath , $bOptimize = false ) {
$oXml = new XMLReader();
$this -> bOptimize = (bool) $bOptimize;
try {
// Open XML file
$oXml->open($sXmlFilePath);
// // Parse Xml and return result
return $this->parseXml($oXml);
} catch (Exception $e) {
echo $e->getMessage(). ' | Try open file: '.$sXmlFilePath;
}
}
/**
* XML Parser
*
* @param XMLReader $oXml
* @return array
*/
protected function parseXml( XMLReader $oXml ) {
$aAssocXML = null;
$iDc = -1;
while($oXml->read()){
switch ($oXml->nodeType) {
case XMLReader::END_ELEMENT:
if ($this->bOptimize) {
$this->optXml($aAssocXML);
}
return $aAssocXML;
case XMLReader::ELEMENT:
if(!isset($aAssocXML[$oXml->name])) {
if($oXml->hasAttributes) {
$aAssocXML[$oXml->name][] = $oXml->isEmptyElement ? '' : $this->parseXML($oXml);
} else {
if($oXml->isEmptyElement) {
$aAssocXML[$oXml->name] = '';
} else {
$aAssocXML[$oXml->name] = $this->parseXML($oXml);
}
}
} elseif (is_array($aAssocXML[$oXml->name])) {
if (!isset($aAssocXML[$oXml->name][0]))
{
$temp = $aAssocXML[$oXml->name];
foreach ($temp as $sKey=>$sValue)
unset($aAssocXML[$oXml->name][$sKey]);
$aAssocXML[$oXml->name][] = $temp;
}
if($oXml->hasAttributes) {
$aAssocXML[$oXml->name][] = $oXml->isEmptyElement ? '' : $this->parseXML($oXml);
} else {
if($oXml->isEmptyElement) {
$aAssocXML[$oXml->name][] = '';
} else {
$aAssocXML[$oXml->name][] = $this->parseXML($oXml);
}
}
} else {
$mOldVar = $aAssocXML[$oXml->name];
$aAssocXML[$oXml->name] = array($mOldVar);
if($oXml->hasAttributes) {
$aAssocXML[$oXml->name][] = $oXml->isEmptyElement ? '' : $this->parseXML($oXml);
} else {
if($oXml->isEmptyElement) {
$aAssocXML[$oXml->name][] = '';
} else {
$aAssocXML[$oXml->name][] = $this->parseXML($oXml);
}
}
}
if($oXml->hasAttributes) {
$mElement =& $aAssocXML[$oXml->name][count($aAssocXML[$oXml->name]) - 1];
while($oXml->moveToNextAttribute()) {
$mElement[$oXml->name] = $oXml->value;
}
}
break;
case XMLReader::TEXT:
case XMLReader::CDATA:
$aAssocXML[++$iDc] = $oXml->value;
}
}
return $aAssocXML;
}
/**
* Method to optimize assoc tree.
* ( Deleting 0 index when element
* have one attribute / value )
*
* @param array $mData
*/
public function optXml(&$mData) {
if (is_array($mData)) {
if (isset($mData[0]) && count($mData) == 1 ) {
$mData = $mData[0];
if (is_array($mData)) {
foreach ($mData as &$aSub) {
$this->optXml($aSub);
}
}
} else {
foreach ($mData as &$aSub) {
$this->optXml($aSub);
}
}
}
}
}
?>
[EDIT BY danbrown AT php DOT net: Fixes were also provided by "Alex" and (qdog AT qview DOT org) in user notes on this page (since removed).]
<?php
//Pull certain elements
$reader = new XMLReader();
$reader->open($xmlfile);
while ($reader->read()) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):
if ($reader->name == "Code")
{
$reader->read();
$code = trim($reader->value);
echo "$code\n";
break;
}
if ($reader->name == "Name")
{
$reader->read();
$customername = trim( $reader->value );
echo "$name\n";
break;
}
if ($reader->name == "Camp")
{
$camp = trim($reader->getAttribute("ID"));
echo "$camp\n";
break;
}
}
}
?>
XML to ASSOCIATIVE ARRAY
Improved algorithm based on Sergey Aikinkulov's. The problem was that it would overwrite nodes if they had the same tag name. Because of that <a><b/><b/><a> would be read as if <a><b/><a/>. This algorithm handles it better and outputs an easy to understand array:
<?php
function xml2assoc($xml) {
$tree = null;
while($xml->read())
switch ($xml->nodeType) {
case XMLReader::END_ELEMENT: return $tree;
case XMLReader::ELEMENT:
$node = array('tag' => $xml->name, 'value' => $xml->isEmptyElement ? '' : xml2assoc($xml));
if($xml->hasAttributes)
while($xml->moveToNextAttribute())
$node['attributes'][$xml->name] = $xml->value;
$tree[] = $node;
break;
case XMLReader::TEXT:
case XMLReader::CDATA:
$tree .= $xml->value;
}
return $tree;
}
?>
Usage:
myxml.xml:
------
<PERSON>
<NAME>John</NAME>
<PHONE type="home">555-555-555</PHONE>
</PERSON>
----
<?
$xml = new XMLReader();
$xml->open('myxml.xml');
$assoc = xml2assoc($xml);
$xml->close();
print_r($assoc);
?>
Outputs:
Array
(
[0] => Array
(
[tag] => PERSON
[value] => Array
(
[0] => Array
(
[tag] => NAME
[value] => John
)
[1] => Array
(
[tag] => PHONE
[value] => 555-555-555
[attributes] => Array
(
[type] => home
)
)
)
)
)
For reasons that have to do with recursion, it returns an array with the ROOT xml node as the first childNode, rather than to return only the ROOT node.
A basic parser
<?php
function xml2assoc($xml) {
$arr = array();
if (!preg_match_all('|\<\s*?(\w+).*?\>(.*)\<\/\s*\\1.*?\>|s', $xml, $m)) return $xml;
if (is_array($m[1]))
for ($i = 0;$i < sizeof($m[1]); $i++) $arr[$m[1][$i]] = xml2assoc($m[2][$i]);
else $arr[$m[1]] = xml2assoc($m[2]);
return $arr;
}
?>
The "XML2Assoc" functions noted here should be used with caution... basically they are duplicating the functionality already present in SimpleXML. They may work but they won't scale.
Their are two main uses cases for parsing XML, each suited to either XMLReader or SimpleXML.
1. SimpleXML is an excellent tool for easy access to an XML document tree using native PHP data types. It starts to flounder with massive (> 50M or so) XML documents, as it reads the entire document into memory before it can be processed. SimpleXML will just laugh at you then die when your server runs out of memory (or it will cause a load spike).
2. Aside from the reasoning behind massive XML documents, if you have to deal with massive XML documents, use XMLReader to process them. Don't try and gather an entire XML document into a PHP data structure using XMLReader and a PHP xml2assoc() function, you are reinventing the SimpleXML wheel.
When parsing massive XML documents using XMLReader, gather the data you need to perform an operation then perform it before skipping to the next node. Do not build massive data structures from a massive XML document, your server (and it's admins) will not like you.
Guys, I hope this example will help
you can erase prints showing the process-
and it will be a piece of nice code.
<?php
function xml2assoc($xml, $name)
{
print "<ul>";
$tree = null;
print("I'm inside " . $name . "<br>");
while($xml->read())
{
if($xml->nodeType == XMLReader::END_ELEMENT)
{
print "</ul>";
return $tree;
}
else if($xml->nodeType == XMLReader::ELEMENT)
{
$node = array();
print("Adding " . $xml->name ."<br>");
$node['tag'] = $xml->name;
if($xml->hasAttributes)
{
$attributes = array();
while($xml->moveToNextAttribute())
{
print("Adding attr " . $xml->name ." = " . $xml->value . "<br>");
$attributes[$xml->name] = $xml->value;
}
$node['attr'] = $attributes;
}
if(!$xml->isEmptyElement)
{
$childs = xml2assoc($xml, $node['tag']);
$node['childs'] = $childs;
}
print($node['tag'] . " added <br>");
$tree[] = $node;
}
else if($xml->nodeType == XMLReader::TEXT)
{
$node = array();
$node['text'] = $xml->value;
$tree[] = $node;
print "text added = " . $node['text'] . "<br>";
}
}
print "returning " . count($tree) . " childs<br>";
print "</ul>";
return $tree;
}
echo "<PRE>";
$xml = new XMLReader();
$xml->open('test.xml');
$assoc = xml2assoc($xml, "root");
$xml->close();
print_r($assoc);
echo "</PRE>";
?>
It reads this xml:
<test>
<hallo volume="loud"> me <br/> lala </hallo>
<hallo> me </hallo>
</test>
Take care about how to use XMLReader::$isElementEmpty. I don't know if it is a bug or not, but $isElementEmpty is set for the current context and NOT just for the element. If you move your cursor to an attribute, $isElementEmpty will ALWAYS be false.
<?php
$xml = new XMLReader();
$xml->XML('<tag attr="value" />');
$xml->read();
var_dump($xml->isEmptyElement);
$xml->moveToNextAttribute();
var_dump($xml->isEmptyElement);
?>
will output
(bool) true
(bool) false
So be sure to store $isEmptyElement before moving the cursor.
Sometimes you have an unusual URL that doesn't actually point to an xml file but still returns xml as output (Like the Battlefield Heroes generated syndication urls). Using get_file_contents(url) you can retrieve the xml data from these urls and pass it as a variable for processing as an XML String.
Unfortunately simpleXML or xml DOM cannot process all xml strings. Some have error boxes added to the end of them (such as Battlefield Heroes syndicated news). These boxes cause an end of file sort of error and closes out the script. XMLReader grabs data from these strings without error.
To verify that all nodes are read without error/warning you can use this code:
<?php
$endofxml = false;
$xml_url = "example.xml";
$reader = new XMLReader();
if(!$reader->open($xml_url)){
print "Error to open XML: $xml_url\n";
} else {
while ($reader->read()) {
$firstnode = (!isset($firstnode)) ? $reader->name : $firstnode;
/*
DO SOMETHING
*/
if ($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == $firstnode) {
$endofxml = true;
}
}
}
if($endofxml) {
print "no error found";
} else {
print "error found";
}
?>
This code is useful to trap $reader->read() error/warning.
Found this in the IXmlReader docs at msdn but it's also valid for XMLReader in PHP.
You should save the value of $isEmptyElement before processing attributes, or call moveToElement to make $isEmptyElement valid after processing attributes.
$isEmptyElement returns FALSE when XMLReader is positioned on an attribute node, even if attribute's parent element is empty.
Wrapper XMLReader class, for simple SAX-reading huge xml:
https://github.com/dkrnl/SimpleXMLReader
Usage example: http://github.com/dkrnl/SimpleXMLReader/blob/master/examples/example1.php
<?php
/**
* Simple XML Reader
*
* @license Public Domain
* @author Dmitry Pyatkov(aka dkrnl) <dkrnl@yandex.ru>
* @url http://github.com/dkrnl/SimpleXMLReader
*/
class SimpleXMLReader extends XMLReader
{
/**
* Callbacks
*
* @var array
*/
protected $callback = array();
/**
* Add node callback
*
* @param string $name
* @param callback $callback
* @param integer $nodeType
* @return SimpleXMLReader
*/
public function registerCallback($name, $callback, $nodeType = XMLREADER::ELEMENT)
{
if (isset($this->callback[$nodeType][$name])) {
throw new Exception("Already exists callback $name($nodeType).");
}
if (!is_callable($callback)) {
throw new Exception("Already exists parser callback $name($nodeType).");
}
$this->callback[$nodeType][$name] = $callback;
return $this;
}
/**
* Remove node callback
*
* @param string $name
* @param integer $nodeType
* @return SimpleXMLReader
*/
public function unRegisterCallback($name, $nodeType = XMLREADER::ELEMENT)
{
if (!isset($this->callback[$nodeType][$name])) {
throw new Exception("Unknow parser callback $name($nodeType).");
}
unset($this->callback[$nodeType][$name]);
return $this;
}
/**
* Run parser
*
* @return void
*/
public function parse()
{
if (empty($this->callback)) {
throw new Exception("Empty parser callback.");
}
$continue = true;
while ($continue && $this->read()) {
if (isset($this->callback[$this->nodeType][$this->name])) {
$continue = call_user_func($this->callback[$this->nodeType][$this->name], $this);
}
}
}
/**
* Run XPath query on current node
*
* @param string $path
* @param string $version
* @param string $encoding
* @return array(SimpleXMLElement)
*/
public function expandXpath($path, $version = "1.0", $encoding = "UTF-8")
{
return $this->expandSimpleXml($version, $encoding)->xpath($path);
}
/**
* Expand current node to string
*
* @param string $version
* @param string $encoding
* @return SimpleXMLElement
*/
public function expandString($version = "1.0", $encoding = "UTF-8")
{
return $this->expandSimpleXml($version, $encoding)->asXML();
}
/**
* Expand current node to SimpleXMLElement
*
* @param string $version
* @param string $encoding
* @param string $className
* @return SimpleXMLElement
*/
public function expandSimpleXml($version = "1.0", $encoding = "UTF-8", $className = null)
{
$element = $this->expand();
$document = new DomDocument($version, $encoding);
$node = $document->importNode($element, true);
$document->appendChild($node);
return simplexml_import_dom($node, $className);
}
/**
* Expand current node to DomDocument
*
* @param string $version
* @param string $encoding
* @return DomDocument
*/
public function expandDomDocument($version = "1.0", $encoding = "UTF-8")
{
$element = $this->expand();
$document = new DomDocument($version, $encoding);
$node = $document->importNode($element, true);
$document->appendChild($node);
return $document;
}
}
?>
As japos mentioned. Take care how you use isEmptyElement. After you are done looping through the attributes: isEmptyElement will be false. You can use moveToElement() to move the cursor back to the element and then you can use isEmptyElement like normal again.
Note that when:
A) <tag></tag>
$xmlRdr->isEmptyElement => false
$xmlRdr->hasValue => true
$xmlRdr->value => ''
$xmlRdr->hasAttributes => false
B) <tag />
$xmlRdr->isEmptyElement => true
$xmlRdr->hasValue => false
$xmlRdr->value => ''
$xmlRdr->hasAttributes => false
C) <tag attribute="value"></tag>
$xmlRdr->isEmptyElement => false
$xmlRdr->hasValue => false
$xmlRdr->value => ''
$xmlRdr->hasAttributes => true
D) <tag attribute="value" />
$xmlRdr->isEmptyElement => true
$xmlRdr->hasValue => false
$xmlRdr->value => ''
$xmlRdr->hasAttributes => true
Please discard my previous note; I pressed 'Add Note' too quickly
About (non-)self-closing tags:
A) <tag></tag>
$xmlRdr->isEmptyElement => false
$xmlRdr->hasValue => false
$xmlRdr->value => ''
$xmlRdr->hasAttributes => false
B) <tag />
$xmlRdr->isEmptyElement => true
$xmlRdr->hasValue => false
$xmlRdr->value => ''
$xmlRdr->hasAttributes => false
C) <tag attribute="value"></tag>
$xmlRdr->isEmptyElement => false
$xmlRdr->hasValue => false
$xmlRdr->value => ''
$xmlRdr->hasAttributes => true
D) <tag attribute="value" />
$xmlRdr->isEmptyElement => true
$xmlRdr->hasValue => false
$xmlRdr->value => ''
$xmlRdr->hasAttributes => true
... and always use the '===' operator when testing properties