Version 1.2.11
Hour 3: Defining XML document types: DTDs
Since XML is a meta-language, we use it for defining new languages. Our language ought to have some syntax besides just being well-formed XML (otherwise, how would MathML be different from MusicML, for example?). So, unless we're being very informal (and probably only temporarily, experimenting), we ought to define what is a valid document in this language. We can do this with a schema.
A schema specifies the structure of a valid XML document, much like a database table schema specifies the structure of the table.
There are at least three kinds of schemas:
In this "hour", we focus on DTDs, which are the earliest form, inherited from SGML. While DTDs are less powerful than the other two kinds, the are useful to know because:
Schemas provide more than just documentation, however: they allow documents to be rigorously validated.
A DTD specifies the types of elements and attributes that may occur in a document, and how they are structured.
<!ELEMENT ElementName ElementType>EMPTY() grouping, seqence| or, and |, then you must use parentheses for grouping, because there is no implicit order of precedence between them.? zero or one* zero or more+ one or more#PCDATA means parsed character data
CDATA means unparsed character data (more on this later)#PCDATA must come first in a choice group and the group must be "zero or more", like this: (#PCDATA | somethingelse)*
Any (anything at all)<!ATTLIST ElementName AttributeName AttributeType AttributeDefault ... >CDATA (unparsed character data)(small | medium | large)#REQUIRED (required)#IMPLIED (optional, i.e., not required—strange choice of word)#FIXED value (e.g., #FIXED 10, or maybe #FIXED "10")"10")When designing a document schema, it is not always obvious whether to treat something as an element or as an attribute. Indeed, in many cases, one way is not necessarily right and the other way wrong.
But here are some suggestions.
Key differences:
CDATA, not #PCDATA).The author suggests that constraining the data as much as possible is desirable—why?—and consequently recommends using attributes whenever possible. But remember this advantage occurs if we use DTDs, but not with XSDs.
My recommendation would be: usually, if it's atomic data, use an attribute; if it's not, you must use an element. (Well, you could force it to be an attribute, but then you'd be giving up much of the value of using XML.)
A document should usually say what schema it's intended to conform to. For DTDs, this is done with the !DOCTYPE declaration, just below the XML declaration (pp. 51, 52, 53).
If FILE is "attached" to a document by a !DOCTYPE declaration, then
$ xmllint --noout --valid FILE
will check for errors against the DTD, as well as being well-formed XML.
Page 69, #1 and 2.
If there is more time, consider some ways of representing the following database tables in XML, using flat and nested designs. (Nested: product within distributor, or vice-versa; flat: using the attribute types ID, IDREF, and IDREFS.)
Product table:
| PID | PName |
|---|---|
| P1 | Rowboat |
| P2 | Sailboat |
Distributor table:
| DID | DName |
|---|---|
| D1 | Dale's Boating |
| D2 | Susan's Sails |
Product-Distributor table:
| PID | DID | Quantity |
|---|---|---|
| P1 | D1 | 2 |
| P1 | D2 | 7 |
| P2 | D1 | 5 |
| P2 | D2 | 4 |