RELAX NG

Version 1.2.11

Overview

RELAX NG (“REgular LAnguage for XML, New Generation”) is a schema language that is more expressive than DTDs, yet less complex than XSD. It specializes in structural validity, and can express some things XSD cannot. It doesn’t try to do as much as XSD, though, and XSD can express some things that RELAX NG cannot do (at least not easily).

Also, RELAX NG has both an XML syntax and a “compact” syntax. This introduction to RELAX NG will use mostly the compact syntax, with occasional exhibitions of XML syntax.

Example 1: Elements

Suppose we have these rules for zoos: A zoo has one or more sections, each section with zero or more animals. An animal has a common name and food, and may also have a scientific name.

Here is a sample zoo document:

<zoo>
  <section>
    <section-name>American Desert</section-name>
    <animal>
      <common-name>Sidewinder rattlesnake</common-name>
      <scientific-name>Crotales cerastes</scientific-name>
      <food>rodents, lizards</food>
    </animal>
    <animal>
      <common-name>Roadrunner</common-name>
      <scientific-name>Geococcyx californianis</scientific-name>
      <food>insects, lizards, snakes, fruits, seeds</food>
    </animal>
    <animal>
      <common-name>Kangaroo rat</common-name>
      <food>grass seeds, leaves, stems, fruit</food>
    </animal>
  </section>
  <section>
    <section-name>African Jungle</section-name>
  </section>
</zoo>

Here is a RELAX NG schema (in compact syntax) which expresses these rules (and a bit more):

element zoo {
  element section {
    element section-name { text },
    element animal {
      element common-name { text },
      element scientific-name { text }?,
      element food { text }
    }*
  }+
}

The containment structure directly mirrors that of the zoo XML document. As with DTDs, “?” means optional (0 or 1), “*” means 0 or more, and “+” means one or more. The “,” between elements means sequence (which is why we have “a bit more” structure required here). Text content is represented by text instead of #PCDATA.

— Now, how would you express this in XML syntax?

— Regardless of how you answered that, here is the RELAX NG XML syntax for the zoo schema:

<element name="zoo" xmlns="http://relaxng.org/ns/structure/1.0">
  <oneOrMore>
    <element name="section">
      <element name="section-name">
        <text/>
      </element>
      <zeroOrMore>
        <element name="animal">
          <element name="common-name">
            <text/>
          </element>
          <optional>
            <element name="scientific-name">
              <text/>
            </element>
          </optional>
          <element name="food">
            <text/>
          </element>
        </element>
      </zeroOrMore>
    </element>
  </oneOrMore>
</element>

Well — you knew it was going to be long-winded, didn’t you?

Example 2: Attributes

Attributes are described much like elements, except:

So if we want to make common-name and scientific-name attributes, instead of elements, like this:

    <animal common-name="Sidewinder rattlesnake"
            scientific-name="Crotales cerastes">
      <food>rodents, lizards</food>
    </animal>

or

    <animal scientific-name="Crotales cerastes"
            common-name="Sidewinder rattlesnake">
      <food>rodents, lizards</food>
    </animal>

we could declare the animal element like this:

    element animal {
      attribute common-name { text },
      attribute scientific-name { text }?,
      element food { text }
    }*

or in the XML syntax:

      <oneOrMore>
        <element name="animal">
          <attribute name="common-name">
            <text/>
          </attribute>
          <optional>
            <attribute name="scientific-name">
              <text/>
            </attribute>
          </optional>
          <element name="food">
            <text/>
          </element>
        </element>
      </oneOrMore>

Comments

Comments (in compact syntax) begin with # and continue to the end of a line.

Emptiness

An empty element may be declared empty; however, this is required only if the element has no attributes (and no children):

element hr { empty } # HTML horizontal rule element

Choice

As in DTDs, a “|” means “or”, and we can use it to express enumerations:

element traffic-light {
  attribute color { "red" | "yellow" | "green" }
}

But we can use “|” in other contexts as well, to allow an animal element to have either a common-name or scientific-name, as a choice of elements:

    element animal {
      (element common-name { text } |
       element scientific-name { text }),
      element food { text }
    }*

or as a choice of attributes:

    element animal {
      (attribute mmon-name { text } |
       attribute scientific-name { text }),
      element food { text }
    }*

or even as a choice between an element an an attribute:

    element animal {
      (element common-name { text } |
       attribute scientific-name { text }),
      element food { text }
    }*

There is no implicit order of precedence between “|” (choice) and “,” (sequence), so if you use these in combination, you must use parentheses for grouping, as shown above. (This is true also for DTDs.)

The XML syntax uses a <choice> element, so the traffic light schema becomes

<element name="traffic-light">
  <attribute name="color">
    <choice>
      <value>red</value>
      <value>yellow</value>
      <value>green</value>
    </choice>
  </attribute>
</element>

The XML syntax for a choice of element or attribute is

<choice>
  <element name="common-name">
    <text/>
  </element>
  <element name="scientific-name">
    <text/>
  </element>
</choice>

Interleaving

The “&” connects child elements that are allowed to occur in any order. So,

    element animal {
      (element common-name { text } &
       element scientific-name { text }),
      element food { text }
    }*

allows us to have both

    <animal>
      <common-name>Roadrunner</common-name>
      <scientific-name>Geococcyx californianis</scientific-name>
      <food>insects, lizards, snakes, fruits, seeds</food>
    </animal>

and

    <animal>
      <scientific-name>Geococcyx californianis</scientific-name>
      <common-name>Roadrunner</common-name>
      <food>insects, lizards, snakes, fruits, seeds</food>
    </animal>

“&” isn’t needed for attributes, because attributes always may occur in any order.

Data Types and Constraints

RELAX NG of itself has no language for specifying data types, such as integer, floating point, or date. However, it is able to “borrow” datatypes from other languages, typically from XSD. (Actual options available depend on the RELAX NG implementation.)

element person {
  attribute name { xsd:string { minLength = "2", maxLength = "40" } },
  attribute married { xsd:boolean },
  element age { xsd:integer { minInclusive = "0" } }
}

Unlike XSD, RELAX NG has no easy way to specify a minimum or maximum number of occurrences of an element. If you want element a to have between 2 and 4 element b children, you can use optionality:

element a {
  element b,
  element b,
  element b?,
  element b?
}

but this approach becomes awkward if you want to say between 50 and 100!

Grammars

Like XSD, RELAX NG is able to define named types, using a grammar.

See the references for details.

Comparison Schema Languages

DTD, XSD, and RELAX NG are three languages for defining XML document schemas. Which is best? For what purpose?

It should be pretty clear that RELAX NG can express much of what DTDs can express.

The compact syntax of RELAX NG is clearer and more succinct than either DTD or XSD syntax, and even the XML syntax of RELAX NG is simpler than that for XSD.

One thing DTDs can do, but (as far as I know) neither RELAX NG nor XSD can do, is define entities (like &copy; for the copyright symbol in HTML).

RELAX NG makes it easiest to express unordered combinations of elements that may occur more than once.2

Both DTD and XSD allow the schema to specify default and fixed values; RELAX NG does not.3

XSD makes it easier to express minimum and maximum number of occurrences for an element.

There is not one “right” schema definition language for all problems, and there probably cannot be. Select the one that is best for the task.

Validating with Jing or Xmllint

There is no way to specify, within the XML document, which RELAX NG schema it is supposed to conform to.

Using jing on merlin (use -c for compact syntax, omit for XML syntax):

$ /home/info/bin/jing -c RNGFile XMLFile

Using jing elsewhere:

$ java -jar /PATH/TO/jing.jar -c RNGFile XMLFile

Using xmllint (only with XML syntax):

$ xmllint --relaxng SCHEMAFILE XMLFILES

References

Software


  1. Version history:
    • Version 1.2.1, 2013 Mar 30. Added “Jing or” to the title of the section “Validating with Jing or Xmllint”.
    • Version 1.2, 2012 Mar 23. Corrected cardinalities in zoo schema: * and + for the animal and section elements.
    • Version 1.1, 2012 Mar 22. Correction on xsd:all.
    • Version 1.0, 2012 Mar 16. Initial version.
  2. David Mertz points out here that XSD cannot do this. Although xsd:all allows for unordered child elements, each child element type can occur just 0 or 1 time.

  3. James Clark, cited here, argues that such “infoset augmentation” is undesirable.