11–3: HTML
Hypertext Markup Language

Reading

Notes

Hypertext is the ability of a text document to link, that is, refer to, another document or for one part of a document to link to another part. The World Wide Web (WWW, or just "web") is an interlinked collection of hypertext documents. The fundamental technologies of the web are markup languages for authoring hypertext documents (HTML, XML, XHTML); a protocol for serving hypertext documents (HTTP)—though, as it turns out, HTTP can be used to serve much, much more than hypertext (images, JavaScript, PDF, etc., etc.); HTTP servers and clients (commonly called web servers and web browsers—although other kinds of HTTP clients are possible); and various arrangements for a client to interact with processes on the server side (forms, CGI, etc.).

The WWW is not the same as the Internet: it is just one part of the Internet. We have been studying Internet applications all along: sockets, RPC, etc., all are tools for building Internet applications. We are now studying, more specifically, web applications.

An intranet is a private network based on Internet protocols (TCP/IP, UDP, etc.). Whatever tools we have for building Internet applications, therefore, can also be used to build intranet applications.

Markup Languages

Markup languages use markers called tags to annotate the text of a document, either to specify its structure and content (descriptive markup), or to specify its presentation, i.e., layout and appearance (procedural markup), or both.

HTML: HyperText Markup Language

The original HTML is a combined descriptive and procedural markup language. If you do not know HTML, or need a refresher, read the tutorial by Raggett before continuing.

With the experience gained from early versions of HTML, Tim Berners-Lee, the inventor of the WWW, has decided that it is better to separate the content from the presentation. This has led to the development of XML for content description and CSS for presentation style.

XML: eXtensible Markup Language

XML is a meta-language for creating markup languages. Its syntax resembles that of HTML, but the language designer is free to create new tags. The main goal of XML is to provide a basis for the semantic web, an evolution of the WWW in which documents label their content in such a way that it can be easily processed by programs.

Languages based on XML are called XML applications. Examples of XML applications include MathML (for describing mathematical formulas) and SVG (Scalable Vector Graphics). And yes—XML-RPC.

Stylistic markup is excluded from XML; instead, the XML document may link to a separate CSS style sheet.

CSS: Cascading Style Sheets

CSS is a language for writing style rules. For example, one can write a CSS rule that says the background color of a table element should be red, or that a p (paragraph) element should use a certain font family and size. An XML document may include a link to a CSS file, which provides guidelines for its presentation.

XHTML

XHTML is a reformulation of HTML 4 based on XML. Its goals are to separate style from content (the style is provided by CSS), and to regularize the syntax of the markup so that it can be processed by programs more easily and reliably.

Principal differences from HTML:

  1. In XHTML, as in XML, all tags come in pairs: an element consists of a pair of tags and the content (if any) sandwiched between. For example, a paragraph (p) element must be like this in XHTML:

    <p>I saw a rabbit.</p>
    

    In (pre-X) HTML, the closing tag could be omitted:

    <p>I saw a rabbit.
    

    For empty document elements (elements with no content between the tags), there is a shortcut: for example, we may write <br/> instead of <br></br>

  2. Elements must be properly nested in XHTML: if element A contains the beginning tag of element B, then it must also contain the ending tag of element B. In HTML you might get away with something like this:

    <p>Mary had a little lamb;
      its fleece was <em>white as snow.</p>
    <p>And everywhere</em> that Mary went, 
      the lamb was sure to go.</p>
    

    But in XHTML we would have to write:

    <p>Mary had a little lamb;
      its fleece was <em>white as snow.</em></p>
    <p><em>And everywhere</em> that Mary went, 
      the lamb was sure to go.</p>
    
  3. XML is case-sensitive, and in XHTML all tags are lower case: <html>, not <HTML>.
  4. All tag attributes must be enclosed in quotation marks. Attributes are extra information that reside in the tag, rather than between tags. For example, we must write

    <img src="mypicture.png">
    

    rather than

    <img src=mypicture.png>
    
  5. HTML elements and attributes that expressed style rather than structure have been phased out. Thus in XHTML there are no such elements as font, i (italic letters), and b (boldface); and the table element no longer has a border attribute; these are now the concern of CSS. However, the tags em (emphasis) and strong (strong emphasis) remain, and are usually rendered the same ways as i and b.
  6. In most elements, the name attribute has been replaced by the id attribute. Any element with an id attribute now can be the target of a link into the document: instead of

    <p>The hunter told a <a href="#fox-story">story about a fox.</p>
    ...
    
    <a name="fox-story"><p>The clever, brown fox wagged its tail.</p></a>
    

    we can now write simply

    <p>The hunter told a <a href="#fox-story">story about a fox.</p>
    ...
    
    <p id="fox-story">The clever, brown fox wagged its tail.</p>
    

In addition, XHTML documents should begin with an XML declaration, a DOCTYPE declaration, and an XML namespace (xmlns) attribute in the html tag, such as the following

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

instead of just

<html>

HTML 5

The World Wide Web Consortium (W3C) is now developing HTML 5 as the successor of both HTML 4 and XHTML. Surprisingly, HTML 5 will allow but not require XML; it is designed to be forgiving of syntax errors.

HTML 5 does requires no xml declaration, and only a very simple doctype declaration:

<!DOCTYPE html>