Приглашаем посетить
Средние века (svr-lit.ru)

Examining an HTML DTD

Examining an HTML DTD

HTML is an application of SGML, so it allows the creation of structured documents. Unfortunately, a recent scan of the Web shows that most Web pages are not true HTML documents. Browser implementations are partly to blame for this laxity because they are very lenient when parsing documents and often attempt to decipher the Web author's intent, rather than reject invalid documents.

With the introduction of Dynamic HTML and CSS, structure takes on greater importance. Pages that are properly structured will interact better and be more reliable across multiple browsers. Scripts will run much more predictably because there is no ambiguity in the document's description. The event architecture exposed by Dynamic HTML also relies heavily on the document's structure.

Understanding how to create a proper HTML document requires the ability to read a DTD(document type definition). The DTD defines the set of valid elements, identifies which elements can be properly contained by other elements, and specifies the valid attributes for each element. This section introduces you to the basics of reading and understanding a DTD; it is not intended to teach you how to author and create custom DTDs. Explaining all aspects of an SGML DTD would require an entire book—of which many are available.

Defining an Element

An element in the DTD is defined using the ELEMENT keyword. The element's definition specifies whether the element contains anything and whether the begin and end tags are optional or required. The following code demonstrates a prototype for defining an element:

<!ELEMENT elementName beginTag endTag contentModel>

The beginTag and endTag placeholders can be either a hyphen (-) or an O. A hyphen indicates that the tag is required, and an O indicates that the tag is optional. The contentModel placeholder can be EMPTY, which indicates that the element cannot contain anything, or it can be a specification of the valid contents for the element. The following code defines a Body element, in which the begin and end tags are optional:

<!ELEMENT BODY O O %body.content  
   -- Begin and end tags are optional, containing body.content. -->

While there are many elements in HTML that support optional begin and end tags, it is still good practice to always explicitly provide them. Doing so helps make the document much more readable and reusable, especially to those who do not understand the intricacies of HTML. When these delimiters are not supplied, the browser will infer their location based on the contents.

The preceding Body element definition specifies that the element can contain %body.content. The % in this specifier indicates that the contents are defined through a macro (called an entity in SGML). The <!ENTITY % body.content…> definition specifies the elements that can be contained within a Body element. Such macros are useful because they allow contents models to be reused by multiple elements, making the DTD more compact and easier to use. Contents models can also be defined directly in line. For example, the code on the following page defines the Map element, which can contain only Area elements.

<!ELEMENT MAP - - (AREA)*>

The set of valid elements in the contents model is specified using a simple regular expression language. The * qualifier following the (AREA) tag indicates that any number of Area elements can be contained within a Map element.

Defining Attributes

Attributes are defined in a manner similar to elements. Attribute lists are defined using the !ATTLIST keyword. The attributes for the Body element are defined as follows:

<!ATTLIST BODY
   %attrs;                  -- id, class, style, lang, dir, events --
   %focus; 
   background %URL #IMPLIED -- texture tile for document background --
   topmargin; CDATA #IMPLIED
   leftmargin; CDATA #IMPLIED
   %body-color-attrs;         -- bgcolor, text, link, vlink, alink --
   onLoad   %script  #IMPLIED -- intrinsic event --
   onUnload %script  #IMPLIED -- intrinsic event --
   >

The first tag following the !ATTLIST keyword specifies the element the attributes are associated with and is followed by the attribute list. Each attribute is either a macro pointing to another list of attributes or a definition of the data type that indicates whether the attribute is required or implied. A macro can be used to associate a group of attributes with the element or even to specify the data type.

Defining an Entity

An entity is a macro that can be reused elsewhere in the DTD. The attrs entity used by the Body element is shown below along with the style entity. Notice that the attrs entity points to additional entities: style, i18n (internationalization), and events.

<!ENTITY % attrs "%style %i18n %events">
<!ENTITY % style
  "id     ID      #IMPLIED  -- document-wide unique id --
   class  CDATA   #IMPLIED  -- comma list of class values --
   style  CDATA   #IMPLIED  -- associated style info --
   title  CDATA   #IMPLIED  -- advisory text --
   >

The body.content entity is also defined using other entities:

<!ENTITY % body.content "(%heading | %text | %block | ADDRESS)*">

This definition indicates that the body can contain any number of the elements specified by the %heading, %text, and %block entities and any number of Address elements.

One of the most complex elements in HTML is the Table element. Here is the definition for the Table element:

<!ELEMENT table     - - 
   (caption?, (col*|colgroup*), thead?, tfoot?, tbody+)>
<!ELEMENT caption   - - (%text;)+>
<!ELEMENT thead     - O (tr+)>
<!ELEMENT tfoot     - O (tr+)>
<!ELEMENT tbody     O O (tr+)>
<!ELEMENT colgroup  - O (col*)>
<!ELEMENT col       - O EMPTY>
<!ELEMENT tr        - O (th|td)+>
<!ELEMENT (th|td)   - O %body.content>

The table's contents can begin with a single optional caption, followed by any number of Col or ColGroup elements, followed by a single optional THead element and an optional TFoot element, followed by one or more TBody elements. The comma delimiter defines the ordering of the elements. Therefore, the Caption element, if supplied, must be the first element contained within the table.

It may seem odd that the table does not allow a TR element to exist immediately below the table. This does not mean that almost all tables on the Web are invalid. The TBody element is defined as having an optional begin tag and an optional end tag. Therefore, a TR outside of a THead or TFoot implicitly falls into the TBody. This relationship is further maintained in the object model, where the TBody element is always synthesized. A synthesized element in the object model represents an element that implicitly belongs to all documents, regardless of whether it is explicitly defined. For example, all documents are considered to have HTML, Head, and Body elements exposed in the object model. Synthesized elements in the object model are discussed in greater detail in Chapter 7, "Document Element Collections."

This concludes your brief introduction into the world of SGML DTDs. You should now be able to read an HTML DTD and create valid HTML documents. For more information about HTML and to obtain valid DTDs for all versions of HTML, see the W3C Web site (www.w3.org). To see the DTD used in Internet Explorer 4.0, see the Microsoft Web site (www.microsoft.com).

[Содержание]