Приглашаем посетить
Бальмонт (balmont.lit-info.ru)

5.1 An introduction to XML and XSLT

Table of Contents

Previous Next

5.1 An introduction to XML and XSLT

5.1.1 What is XML?

Extensible Markup Language (XML) is similar to XHTML and is used to develop documents or pages used on the World Wide Web. They both share the same structure and the main components of the language are elements. Unlike XHTML, which is designed for formatting and displaying a document, the main purpose of an XML page (or document) is to describe the data structure and relationships of the elements (XML elements) involved. Because of that, XML is more abstract, general, and more powerful in the sense that it is extensible. Extensible means XML has no predefined elements. In other words, you can create your own elements and attributes in an XML document. In fact, when you develop an XML page, you have no choice but to create your own elements and attributes. More interestingly, attributes can be reformulated by the relationship of child elements in a more general way.

A well-defined (or well-formed) XML document conforms to the following XML rules:

  • Must begin with XML declaration, e.g., <?xml version="1.0" encoding="iso-8859-1"?>.

  • Must have one unique root element, e.g., <root>.

  • All start tags must match end tags, e.g., <contents></contents>.

  • XML tags are case sensitive.

  • All elements must be closed.

  • All elements must be properly nested.

  • All attribute values must be quoted, e.g., <contents from="www.pwt-ex.com">.

  • XML entities must be used for special characters, e.g., &lt;, &gt;, &nbsp;.

These rules are the same as those in XHTML mentioned in Chapter 1 since XHTML documents are technically XML pages. Consider the following example:



Example: ex05-01.xml - My First XML Page

1: <?xml version="1.0" encoding="iso-8859-1"?>
2: <message>
3:   <contents>My First XML Page</contents>
4:   <from>www.pwt-ex.com</from>
5: </message>

This is a simple XML page. The first line specifies the document as XML version 1.0 with character set Latin-1/Western European. Basically, this is the only requirement for a well-formed XML page. Some of you may notice that this line 1 is the same as all XHTML pages in this book. The reason is simple: all XHTML pages conform to the XML standard and they are all XML well-formed documents.

The rest of the page defines a message element <message>. This element contains two child elements, <contents> and <from>. The values of these two elements are also defined. Unlike XHTML, all white spaces in XML are recognized. As a result, this page describes a relationship of elements <contents>, <from>, and <message>. Lines 24 can also be formulated as



<message>
  <contents from="www.pwt-ex.com">My First XML Page</contents>
</message>

In this case, the from attribute in ex05-01.xml is considered as an attribute of <contents>. You can rewrite both contents and from as attributes in the <message> if you prefer. Also, all names of XML elements are user defined.

Since there is no predefined element in XML, we cannot expect an ordinary browser such as IE, NS, or Opera to display the page with formatting properties. For example, if you request the document ex05-01.xml by using

http://www.pwt-ex.com/book/chap05a/ex05-01.xml

you will see Fig. 5.1 on your screen.

Figure 5.1. ex05-01.xml on IE6.+

graphics/05fig01.gif


You may ask: if XML contains no formatting property or element, how can we display the page on the Web? Where are all those font-family, font-size, colors, images, and tables in XHTML?

The answer lies in the beauty of XML. Since the structure of XML is abstract but technically simple, XML pages can be transformed to other software or device languages relatively easy. In fact, XML has been used to create the Wireless Markup Language (WML) and display WML pages on your mobile phone. For our Web environment and Web application, we will consider how to transform XML pages into XHTML and one of the popular choices is to use XML Style Sheet Language transformation (XSLT).

5.1.2 What is XSLT and how does it work?

One of the best ways to explain XSLT and to show you how it works is by example. The following is a simple XSLT example.

XHTML uses predefined elements, which can be displayed directly on a browser. The CSS for XHTML can be considered as a more structural way to organize the formatting layout so that reusability and structure are enhanced.

For XML, the style sheet is called XSL (XML Style Sheet). Since there is no predefined element in XML, traditional XHTML elements such as <div>, <p>, and <table> no longer have meanings. The role of XSL is more abstract and far more powerful than CSS. On the whole, XSL is a language or mechanism to describe how XML documents should be displayed.

One of the most important parts of XSL is the XSL transformation (XSLT), which can be used to transform an XML page into other formats. Consider the following example:



Example: ex05-02.xml - My First XML Page With XSLT

1: <?xml version="1.0"?>
2: <?xml-stylesheet type="text/xsl" href="ex05-02.xsl" ?>
3: <message>
4:   <contents>My First XML Page With XSLT</contents>
5:   <from>www.pwt-ex.com</from>
6: </message>
7:

If you compare this page with ex05-01.xml, you will find that the main different is in line 2. This line



<?xml-stylesheet type="text/xsl" href="ex05-02.xsl" ?>

defines a transformation method for the page. The transform is based on XSL with text/xsl type. The detailed XSL transformation and specifications are defined in the file ex05-02.xsl. The coding of this file is listed below:



Example: ex05-02.xsl - The XSLT Transformation File For ex05-02.xml

 1: <?xml version="1.0" encoding="iso-8859-1"?>
 2: <xsl:stylesheet version="1.0"
 3:   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 4:
 5: <xsl:template match="/">
 6:
 7:  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 8:  <head><title>My First XSLT Transform</title></head>
 9:  <body style="background:#000088;color:#ffff00;font-family:arial;
10:              font-size:24pt;font-weight:bold;text-align:center">
11:  <div style="text-align:center"><br /><br />
12:    <xsl:value-of select="message/contents" /> <br /><br />
13:    <xsl:value-of select="message/from" /> <br />
14:   </div>
15: </body>
16: </html>
17:
18: </xsl:template>
19: </xsl:stylesheet>

Line 1 is the header for the XML page. This means that XSLT files are basically XML pages. Lines 23 define the header for XSLT. In this case, whoever calls this XML document will use this style sheet to transform an XML page.

Line 5 defines an XSL template. The attribute match="/" means that this template is used to transform the entire XML page. The actual template is defined in lines 716 and, in fact, is an XHTML document. The XSLT, in effect, will transform the calling document into an XHTML document and therefore can be displayed directly on a browser.

Inside the XHTML template, the interesting part is the XSL element (line 12)



<xsl:value-of select="message/contents" />

This element is to get the value of the original XML page. In this case, the value is the string under the root element <message> and the first child element <contents>. The value is "My First XML Page With XSLT" which is located at line 4 of ex05-01.xml.

When you request this XML page with

http://www.pwt-ex.com/book/chap05a/ex05-02.xml

you will see the transform in action as illustrated in Fig. 5.2. Before we continue to discuss XML and XSLT, let's consider some Document Type Definitions and schema used in XML.

Figure 5.2. XML with XSLT

graphics/05fig02.jpg


5.1.3 The Document Type Definition (DTD) used in XML

The purpose of a DTD is to define the legal building blocks of a markup language (ML). DTD is generally defined in the Standard Generalized Markup Language (SGML: ISO 8879) and used by all known MLs such as HTML, XHTML, and XML.

An XML document with correct XML syntax is said to be "Well-Formed." The document is characterized as "Valid" if it also conforms to the rules of a DTD. Usually, you declare your DTD within the DOCTYPE definition. The general DOCTYPE syntax is defined by



<!DOCTYPE root-element [element-declaration]>

This is the internal declaration of DOCTYPE. All elements used in the document should be declared inside the brackets. Consider the following example:



Example: ex05-03.xml - An XML Page With DTD

 1: <?xml version="1.0" encoding="iso-8859-1"?>
 2: <?xml-stylesheet type="text/xsl" href="ex05-02.xsl" ?>
 3: <!DOCTYPE
 4:   message
 5:   [
 6:     <!ELEMENT message (contents,from)>
 7:     <!ELEMENT contents (#PCDATA)>
 8:     <!ELEMENT from (#PCDATA)>
 9:   ]
10: >
11: <message>
12:   <contents>My First XML Page</contents>
13:   <from>www.pwt-ex.com</from>
14: </message>

Line 3 defines the declaration of the DOCTYPE. The root element of the page is also declared. Inside the element-declaration bracket (lines 68), three elements are defined. The first element is called <message> and contains two child elements, <contents> and <from>. The document type of the <contents> element is Parsed Character Data (PCDATA). Some frequently used building blocks or components of DTD are:

• Elements

The main components of both XML and XHTML.

• Tags

Used to mark up elements.

• Attributes

Provide extra information about elements.

• Entities

Variables to define special text or characters. Some commonly used entities are:

Entity

Character

 

&lt;

<

 

&gt;

>

 

&amp;

&

 

&quot;

"

 

&apos;

'

 

&nbsp;

Non-breaking-space

 


• PCDATA

The text found between the start tag and the end tag of an XML element. Elements and entities inside PCDATA will be expanded.

• CDATA

Text that will not be parsed by a parser.


The DTD defined in this way is called the internal DOCTYPE declaration due to the fact that all elements are declared inside the page. Alternatively, you can define the DTD as an external file. The general syntax to declare an external DOCTYPE is



<!DOCTYPE root-element SYSTEM "filename">

For example, you can define all elements in lines 68 of ex05-03.xml in a file called ex05-05.dtd. This file can be used by many XML pages with element <message> and subelements <from> and <contents> including ex05-04.xml below:



Example ex05-04.xml - An XML Page Using DTDs

1: <?xml version="1.0" encoding="iso-8859-1"?>
2: <?xml-stylesheet type="text/xsl" href="ex05-02.xsl" ?>
3: <!DOCTYPE message SYSTEM "ex05-05.dtd">
4: <message>
5:   <contents>My First XML Page</contents>
6:   <from>www.pwt-ex.com</from>
7: </message>

By using DTD, your XML pages can have a format defined by you and describing each element of the document. Many organizations on the Web such as W3C offer software DTD validators to verify your page and issue validated certificates. W3C also offers logo(s) for validated pages (see www.w3.org). With DTD, other groups or people can interchange data with you.



Example: ex05-05.dtd - External DTDs For XML Pages

1:  <!ELEMENT message (contents,from)>
2:  <!ELEMENT contents (#PCDATA)>
3:  <!ELEMENT from (#PCDATA)>

While the DTD provides definitions for elements and attributes in XML pages, the developers recognize the following drawbacks:

  • DTD is not written like XML.

  • There is little or no support for data types.

To address these shortcomings, W3C created the so-called "XML Schema Definition (XSD)."

5.1.4 The XML Schema Definition (XSD)

XML Schema was originally created by Microsoft to provide data type definitions for XML pages. It became an official W3C recommendation in the middle of 2001. Similar to DTD, XSD is used to define the legal building blocks of an XML page. The main difference is that XSD is written in XML and with data types supported. Since we mainly use DTDs in this chapter, only a brief XSD discussion is provided in this section.

One of the simplest ways to understand the idea of XSD is by example. Consider the following:



Example: ex05-06.xml - An XML Page Using XSD

1: <?xml version="1.0" encoding="iso-8859-1"?>
2: <?xml-stylesheet type="text/xsl" href="ex05-02.xsl" ?>
3: <message
4:    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5:    xsi:nonamespaceSchemaLocation="ex05-07.xsd">
6:
7: <contents>My First XML Page</contents>
8: <from>www.pwt-ex.com</from>
9: </message>

From this simple page, you can see that there is no DTD defined. Instead, the XSD references are embedded inside the message element <message>. The detailed XSDs of this page are declared in an external file called ex05-07.xsd. This file can be used by XML pages with a structure of <message>, <from>, and <contents> elements including ex05-06.xml. Consider the following:



Example: ex05-07.xsd - XSDs For XML Pages With <from> and <contents>

 1: <?xml version="1.0" encoding="iso-8859-1"?>
 2: <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 3: <xs:element name="message">
 4:   <xs:complexType>
 5:     <xs:sequence>
 6:       <xs:element name="contents" type="xs:string" />
 7:       <xs:element name="from" type="xs:string" />
 8:     </xs:sequence>
 9:   </xs:complexType>
10: </xs:element>
11: </xs:schema>

Line 3 specifies the element message as the root element of the page. This message element <message> is declared as complexType since it contains other elements. Inside the <message> element there is a sequence of elements defined. They are, obviously, the <contents> and <from>. Consider the definition of the <contents> element:



<xs:element name="contents" type="xs:string" />

In XSD, elements can be defined with a name and a type. The type is used to declare the data type of the element. In this case, the data type of the contents element is a string. Some frequently used XSD data types are:



xs:string       xs:decimal
xs:integer      xs:boolean
xs:date         xs:time

These data types provide a rich set of definitions for data used on XML pages. As we mentioned at the beginning of this section, only a brief discussion on XSD is provided. For more detailed information, the relevant XML Schema pages from www.w3.org are recommended.

Now, let's start to use XSLT to convert XML pages into XHTML.

    Table of Contents

    Previous Next