DTD - XML Construction Module

The main building blocks of XML and HTML documents are tags like <body>....</body>.

XML document building blocks

All XML documents (as well as HTML documents) are composed of the following simple building blocks:

  • Elements
  • Attributes
  • Entity
  • PCDATA
  • CDATA

Below is a brief description of each building block.

Elements

Elements are the building blocks of XML and HTML documentsMain building blocks.

Examples of HTML elements are "body" and "table". Examples of XML elements are "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br", and "img".

Example:

<body>body text in between</body>
<message>some message in between</message>

Attributes

Attributes can provideAdditional information about an element.

Attributes are always placed in the beginning tag of an element. Attributes are always preceded byName/ValuePairs that appear in the form of.

<img src="computer.gif" />

The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty, it is closed with a "/".

Entity

Entities are used to define variables for ordinary text. Entity references are references to entities.

Most students are familiar with this HTML entity reference: " ". This 'non-breaking space' entity is used in HTML to insert an additional space in a document.

When the document is parsed by the XML parser, the entities will be expanded.

The following entities are predefined in XML:

Entity Reference Character
< <
> >
& &
" "
' '

PCDATA

The meaning of PCDATA is parsed character data (parsed character data).

You can imagine character data as the text between the start and end tags of an XML element.

PCDATA is text that will be parsed by the parser. These texts will be checked for entities and markup by the parser.

The tags in the text will be treated as markup, and the entities will be expanded.

However, the parsed character data should not contain any &、< or > characters; they need to be replaced with &、< and > entities respectively.

CDATA

The meaning of CDATA is character data (character data).

CDATA is text that will not be parsed by the parser.The tags in these texts will not be treated as markup, and the entities will not be expanded.