DTDs

XML, Part 3: Document Type Definitions

Version 1.2.11

Hour 3: Defining XML document types: DTDs

Since XML is a meta-language, we use it for defining new languages. Our language ought to have some syntax besides just being well-formed XML (otherwise, how would MathML be different from MusicML, for example?). So, unless we're being very informal (and probably only temporarily, experimenting), we ought to define what is a valid document in this language. We can do this with a schema.

A schema specifies the structure of a valid XML document, much like a database table schema specifies the structure of the table.

There are at least three kinds of schemas:

In this "hour", we focus on DTDs, which are the earliest form, inherited from SGML. While DTDs are less powerful than the other two kinds, the are useful to know because:

Schemas provide more than just documentation, however: they allow documents to be rigorously validated.

DTD Declarations

A DTD specifies the types of elements and attributes that may occur in a document, and how they are structured.

Element Declarations

Attribute Declarations

Design Issues: Elements or Attributes?

When designing a document schema, it is not always obvious whether to treat something as an element or as an attribute. Indeed, in many cases, one way is not necessarily right and the other way wrong.

But here are some suggestions.

Key differences:

The author suggests that constraining the data as much as possible is desirable—why?—and consequently recommends using attributes whenever possible. But remember this advantage occurs if we use DTDs, but not with XSDs.

My recommendation would be: usually, if it's atomic data, use an attribute; if it's not, you must use an element. (Well, you could force it to be an attribute, but then you'd be giving up much of the value of using XML.)

Attaching a DTD to a Document

A document should usually say what schema it's intended to conform to. For DTDs, this is done with the !DOCTYPE declaration, just below the XML declaration (pp. 51, 52, 53).

Validating a Document with an Attached DTD

If FILE is "attached" to a document by a !DOCTYPE declaration, then

$ xmllint --noout --valid FILE

will check for errors against the DTD, as well as being well-formed XML.

Discussion

Exercises

Page 69, #1 and 2.

If there is more time, consider some ways of representing the following database tables in XML, using flat and nested designs. (Nested: product within distributor, or vice-versa; flat: using the attribute types ID, IDREF, and IDREFS.)

Product table:

PID PName
P1 Rowboat
P2 Sailboat

Distributor table:

DID DName
D1 Dale's Boating
D2 Susan's Sails

Product-Distributor table:

PID DID Quantity
P1 D1 2
P1 D2 7
P2 D1 5
P2 D2 4

  1. Version history:
    • Version 1.2.1, 2012 Mar 16. Noted required parentheses for grouping.
    • Version 1.2, 2011 Apr 8. Fixed some formatting issues, including making visible some code that was "swallowed" by XML syntax.
    • Version 1.1, 2011 Apr 6. Separated DTD section from another file.
    • Version 1.0, 2011 Apr 5.