Using a Document Type Definition (DTD)
Structured Data
Each element in an XML document has a relationship with other elements, which
defines the structure of the data. Structuring data ensures that data is found
in the correct place, and adds context to the document. This results in self-describing,
organised information, separating content from style. Explicit rules state
where a specific part of the document structure may exist. Structured data
is easily processed by search engines, as they're able to index only the relevant
elements.
The explicit rules that state where elements may exist are defined in a Document
Type Definition (DTD). The DTD provides a formal definition of the document
structure and elements that may be used. An XML document is said to be valid
if it contains a DTD, and the content conforms to the constraints expressed
in the DTD. The DTD is part of the prolog, and must be placed before the root
element of the document.
The following is the contents of an XML document called user.xml. If your
browser has an XML Parser, you can View the XML Document here. If your browser
doesn't have an XML Parser, you will just see the contents of the XML document.
user.xml
<?xml version="1.0" ?>
<!DOCTYPE user [
<!ELEMENT user (name,email)>
<!ELEMENT name (forename, surname)>
<!ELEMENT forename (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
<user>
<name>
<forename>Gez</forename>
<surname>Lemon</surname>
</name>
<email>me@mystudio.com</email>
</user>
The above example is an XML Document defining a DOCTYPE of "user",
where "user" is the top-level element. Following the document type
declaration are the element declarations. The element declarations determines
how often, and in what context the elements appear in the document. The document
consists of the elements "name" and "email", in that order.
The "name" element is defined as having the child elements, forename,
and surname, in that order.
Character Data (CDATA)
AN XML document consists of markup, and character data, where the character
data is the text of the document. The markup provides information about the
character data, and is differentiated from the character data using special
characters. The special characters used to differentiate markup from character
data are angled brackets ("<", and ">"), ampersands
("&"), and semicolons (";"). Data specified as Character
Data (CDATA), will not be parsed by the XML parser. The following example uses
a CDATA section to define a JavaScript section in an XHTML document.
<script type="text/javascript">
<![CDATA[
function someFunction()
{
// Function definition
}
</script>
Parsed Character Data (#PCDATA)
The "forename", "surname", and "email" elements
are defined as elements that can contain Parsed Character Data (#PCDATA). PCDATA
is data validated to ensure it is valid. PCDATA may not contain the characters
used to differentiate markup from character data.
The DTD can be stored in an external file. In this case, the DOCTYPE declaration
contains the name of the external file.
user.dtd
<
!ELEMENT user (name,email)>
<!ELEMENT name (forename, surname)>
<!ELEMENT forename (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
The xml document then specifies the location for the external DTD.
user.xml
<
?xml version="1.0" ?>
<!DOCTYPE user SYSTEM "user.dtd">
<user>
<name>
<forename>Gez</forename>
<surname>Lemon</surname>
</name>
<email>me@mystudio.com</email>
</user>