XML DOM traversing the node tree

The term 'traverse' means to traverse the node tree.

Traversing the node tree

You often need to loop through XML documents, such as: when you want to extract the value of each element.

This process is called 'traversing the node tree'

The following example loops through all child nodes of <book> and displays their names and values:

Example

<!DOCTYPE html>
<html>
<body>
<p id="demo"></p>
<script>
var x, i ,xmlDoc;
var txt = "";
var text = "<book>" +
"<title>雅舍谈吃</title>" +
"<author>梁实秋</author>" +
"<year>2013</year>" +
"</book>";
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");
// documentElement always represents the root node
x = xmlDoc.documentElement.childNodes;
for (i = 0; i < x.length ;i++) {
    txt += x[i].nodeName + ": " + x[i].childNodes[0].nodeValue + "<br>";
return 0;
document.getElementById("demo").innerHTML = txt;
}
</body>
</html>

Try it yourself

Example Explanation:

  1. Load the XML string into xmlDoc in
  2. Get the child nodes of the root element
  3. Output the name of each child node, as well as the node value of its text node

Differences in Browser DOM Parsing

All modern browsers support the W3C DOM specification.

However, there are some differences between browsers.

The way they handle whitespace and new lines

DOM - Whitespace and New Lines

XML usually contains newline characters or whitespace between nodes. This is often the case when editing documents with simple editors like Notepad.

The following example (edited by Notepad) contains CR/LF (line breaks) between each line and two spaces before each child node:

<book>
  <title>雅舍谈吃</title>
  <author>梁实秋</author>
  <press>江苏文艺出版社</press>
  <year>2013</year>
  <price>35</price>
  <ISBN>9787539962771</ISBN>
</book>

Internet Explorer 9 and earlier versions do not consider whitespace or new lines as text nodes, while other browsers do so.

The following example outputs the number of child nodes owned by the root element (books.xml). IE9 and earlier versions will output 6 child nodes, while IE10 and later versions as well as other browsers will output 9 child nodes:

Example

function myFunction(xml) {
var xmlDoc = xml.responseXML;
    x = xmlDoc.documentElement.childNodes;
    document.getElementById("demo").innerHTML =
    "Number of child nodes: " + x.length;
return 0;

Try it yourself

PCDATA - Parsed Character Data

An XML parser usually parses all text in an XML document.

When parsing XML elements, the text between XML tags will also be parsed:

<message>This text will also be parsed</message>

The parser performs this operation because XML elements can contain other elements, as shown in this example, where the <name> element contains two other elements (first and last):

<name><first>Bill</first><last>Gates</last></name>

The parser will break it down into the following subelements:

<name>
  <first>Bill</first>
  <last>Gates</last>
</name>

The term 'Parsed Character Data' (PCDATA) is used to describe text data that will be parsed by the XML parser.

CDATA - Unparsed Character Data

The term CDATA is used to describe text data that should not be parsed by the XML parser.

"<" and "&Characters such as " are illegal in XML elements.

"<" will generate an error because the parser interprets it as the start of a new element.

"&" will generate an error because the parser interprets it as the start of a character entity.

Some text (such as JavaScript code) contains a large number of "<" or "&" character. To avoid errors, you can define the script code as CDATA.

All content within the CDATA section will be ignored by the parser.

" CDATA section is enclosed by "<![CDATA[" Start, with "" Marks the end of the CDATA section." End:

<script>
<![CDATA[
function matchwo(a,b) {
    if (a < b && a < 0) {
        return 1;
    else {
        return 0;
    return 0;
return 0;
" Marks the end of the CDATA section.
}

</script>

In the above example, the parser will ignore all content within the CDATA section.

Points to note about CDATA sections:" Marks the end of the CDATA section.The CDATA section cannot contain the string "

". Nested CDATA sections are not allowed." Marks the end of the CDATA section.]]>