Enterprise Integration & Modeling: Metadatabase Research Home
 home || MDB Research | Virtual Lab | Case Tool | Downloads | Publications | Researchers | Links
 

Studio II.3 Introduction to XML

Authors: David Levermore (leverd@alum.rpi.edu), Donglin Chen

Revised: September 2, 2003

Overview

The number and types of Internet Platforms are growing each day: Wireless phones, Personal Digital Assistants (Palm, Pocket PC) and Internet Appliances (3Com Audrey) all have the ability to view Internet web pages. However, each platform requires different markup languages to format the web pages for example, Wireless Markup Language (WML), Hypertext Markup Language (HTML) and Extensible Markup Language (XML) as well as different protocols to render the web pages, Wireless Application Protocol (WAP) and Hypertext Transport Protocol (HTTP). XML is the more robust of the three markup languages given that it was designed to allow Internet documents to be written once and presented on any device without modification of the internet document; rather a stylesheet would be used to define its presentation.

XML, a specification for creating markup languages, is a subset of the more complicated Standard Generalized Markup Language (SGML) used by book and journal publishers. XML provides the ability to create unique <tag> delimited document format avoiding the use of fixed formats such as <B>, <HEAD> typical of HTML based documents. The World Wide Web Consortium (W3C) provides a more detailed explanation of XML on its own website, http://www.w3.org/XML/1999/XML-in-10-points. Compared with HTML, which was designed to display data, and to focus on how data looks. XML was designed to describe data, and to focus on what data is.

The advantages of using XML as opposed to HTML in your web projects are many:

  • Single data source paradigm used with stylesheets reduces rework for multiple platforms, exhibited by other markup languages.
  • Easy data extraction - search engines and/or shopping agents can better understand the content of a XML document rather than trying to discern its meaning in regular HTML documents.
  • Easy to read structure - XML has a clear and simple syntax allowing it to be easily understood by humans [1]
  • Flexible and open standard - W3C continually updates the standard and accepts input from developers [1].
  • Growing support every day in Internet Browsers, and other Internet devices.

XML has far more powerful uses than web development. Microsoft continues to integrate it within its Office products to standardize information sharing within and between applications; Sun's new StarOffice suite natively creates XML file formats for its office products, while Adobe uses Scalable Vector Graphics (SVG) in a number of its products. "SVG is an emerging, completely open standard that was developed by the World Wide Web Consortium (W3C) and numerous industry players, including Adobe Systems, IBM, Netscape, Sun, Corel, Hewlett-Packard, and others [11]".

The following tutorial provides an introduction to XML using the context of the previously introduced MusicWeb E-commerce Application as a case study. The application makes use of XML when exchanging information with suppliers. More specifically, when a supplier provides parts, such as CD cases and paper, these orders are first initiated by the exchange of XML formatted documents describing the products to be supplied. Also, when an artist provides new music information, these details are provided in an XML format. The format of the XML document used in both cases conforms to internal processes of the MusicWeb company, as such all suppliers must use a template that the company provides. We provide more details on this process later.

XML

Even though XML is a flexible and open standard there are still some rules that govern its use in document development. Primarily, there are two types of XML documents, well-formed and valid documents.

  • A well-formed XML document is a document that conforms to the XML syntax rules, briefly, an open tag must have a close tag among others.
  • A valid XML document is a well-formed XML document, which also conforms to the rules of a Document Type Definition (DTD). DTD's will be explained later.

XML documents have the same syntactic structure as HTML documents. Content is surrounded by markup tags, an open and close tag. Table 1 compares XML markup and HTML markup. We have qualified the text "Database Systems" with the <class> tag in the XML document. As the web-page developer we have defined the <class> tag as a course offering. If we provide this context to the public then anyone else who has read the XML document would appreciate the context and associate Database Systems with a course offering.

XML
HTML
<class>Database Systems</class>
<h2>Database Systems</h2>

Table 1 - XML versus HTML

If however, we were to read the HTML document we would have to understand the context of the entire document, including any other text on the page, before we would understand the semantics of the text between the <h2> tags.

An XML compliant search engine will look at the semantics of the tags and should return more relevant results. Obviously, the relevance of the results depends on the engine itself but the fact that the content is defined by the tag, otherwise known as meta-data results in more successful results. On the other hand, a traditional search engine would have to use keywords and read entire documents to see if the relevant words were present.

Before proceeding to other examples, let us look at the development of an XML document in more detail. Figure 1 describes an unformatted XML file. Copy the text in figure 1 into notepad or any text editor and save the file as studio.xml or download studio.xml here.

<?xml version="1.0"?>

<studio>
<student>
<name>Donglin Chen</name>
<year>2000</year>
<department>DSES</department>

<classes>
<class>Data Mining</class>
<professor>Mohammed J. Zaki</professor>
</classes>
</student>

<student>
<name>Kerstin Wei</name>
<year>2000</year>
<department>IT</department>

<classes>
<class>Computer Science II</class>
<professor>David R. Musser</professor>
</classes>
</student>

<student>
<name>Dave Petroff</name>
<year>1999</year>
<department>MBA</department>

<classes>
<class>Quality control and reliability</class>
<professor>Sunderesh S Heragu </professor>
</classes>
<classes>

<class>Marketing Research</class>
<professor>Jeffrey F Durgee</professor>
</classes>
</student>
</studio>

Figure 1 - studio.xml

Things to note:

  • XML tags are case sensitive, unlike HTML where <b> and <B> are the same thing.
  • The first line in the document - the XML declaration - defines the XML version of the document. In this case the document conforms to the 1.0 specification of XML.
  • The next line describes the root element of the document which in this case is <studio>. <studio> has a child element, <student>, which in turn has other child elements <name>, <year>, etc.

Open studio.xml in an XML compliant browser such as Internet Explorer 4.0 or higher. Note that the browser create a tree structure of the XML document. The root can be collapsed to reveal the children of the root, and each parent below its root can be collapsed to reveal their own children.

Of course this XML document is not of much use as shown for presentation purposes, but stylesheets allow us to format this document for use on any Internet platform with only minor changes made to the XML document.

Introducing Style into XML

CSS (Cascaded Stylesheet) file and XSL (eXtended Stylesheet Language) are two ways to format XML documents for use in web browsers or other internet appliances.

CSS

Figure 2 describes a CSS file. Copy the text in figure 2 into notepad or any text editor and save the file as studio.css, or download studio.css.

studio { background-color: #ffffff; width: 100%; } 
student { Display: block; margin-bottom: 30pt; margin-left: 0; }
name { Display: block; color: #FF0000; font-size: 20pt; }
year,department { color: #0000FF; margin-left: 20pt; font-size: 14pt; }
class,professor { Display: block; color: #000000; margin-left: 20pt; }

Figure 2 - studio.css

In order to display the XML file on the web using CSS, you need to add one more line code to your studio.xml file. Open your studio.xml file using notepad, then add,

<?xml-stylesheet type="text/css" href="studio.css"?>  

under the first line of,

<?xml version="1.0"?>
  • Now save the studio.xml file as studio_css.xml (Make sure not to overwrite the original studio.xml, we will need it again later)
  • Put both the studio.css and studio_css.xml files under v the same file directory, XML.
  • Now click the studio_css.xml files or drag it unto Internet Explorer.

Figure 3 - studio_css.xml

Figure 3 - studio_css.xml

Figure 3 shows a nicely formatted XML document with no sign of XML! Play around with the studio.css file to see if any improvements can be made beyond that shown in Figure 3.

XSL

XSL is far more sophisticated than CSS and one way to use XSL is to transform XML into HTML before it is displayed by the browser. Copy the XSL file in figure 4 into notepad or any text editor and save the file as studio.xsl, or download studio.xsl.

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="#66FFFF">

<tr>
<th>name</th>
<th>year</th>
<th>Department</th>

</tr>
<xsl:for-each select="studio/student">
<tr>
<td><xsl:value-of select="name"/></td>

<td><xsl:value-of select="year"/></td>
<td><xsl:value-of select="department"/></td>
</tr>

</xsl:for-each>
</table>
<br/>
<table border="2" bgcolor="#66FFFF">
<tr>

<th>name</th>
<th>class</th>
<th>professor</th>
</tr>
<xsl:for-each select="studio/student">

<tr>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="classes/class"/></td>
<td><xsl:value-of select="classes/professor"/></td>

</tr>
</xsl:for-each>
</table> </body>
</html>
</xsl:template>
</xsl:stylesheet>

Figure 4 - studio.xsl

In order to display the XML file on the web using XSL, you need to add one more line code to your studio.xml file. Open your studio.xml file using notepad, then add the following code,

<?xml-stylesheet type="text/xsl" href="studio.xsl"?>

under the first line of

<?xml version="1.0"?>

Figure 5 - studio_xsl.xml

Figure 5 - studio_xsl.xml

  • Now save the studio.xml file as studio_xsl.xml.
  • Put both the studio.xsl and studio_xsl.xml files under the same file directory.
  • Now click the studio_xsl.xml files or drag this file into Internet Explorer.

An improvement over that shown in Figure 3. XSL affords us much greater layout capabilities than CSS.

The Document Type Definition

The Document Type Definition (DTD) is the vehicle for the sharing XML tag information and, serves as the defining document for all the tags used in a XML document or a series of documents. It serves to define the layout and expected content of the XML document(s). Of course, anyone can look through an XML document and aggregate all the tags that are used but it is easier to look in the central repository, the DTD. A brief summary of a DTD that can be used for the XML documents in the example above is defined as follows: (More information can be found at the DevEdge Online, developer.netscape.com. To reference a DTD in a XML document the following statement should be included:

<!DOCTYPE studio SYSTEM "studio.dtd">

This line would be included under the version statement in the XML document. The DTD displays content similar to:

<!ELEMENT studio		(student)>
<!ELEMENT student		(name, year, department, classes+)>
<!ELEMENT classes		(class, professor)>
<!ELEMENT name		(#PCDATA)>
<!ELEMENT year		(#PCDATA)>
<!ELEMENT department		(#PCDATA)>
<!ELEMENT professor		(#PCDATA)>
      

Figure 6 - studio.dtd

The DTD describes the following:

  • A Studio has to have a student
  • A student has to have a name, year, department and one or more classes
  • Classes have to have a class and a professor
  • Name, year, department and professor are just text

This DTD allows any web developer to integrate content from this website in his/her own website. We know that to get the students name, we simply look for the name tag. To look for the courses they are involved in, we look for the class tag.

MusicWeb E-commerce Application - Information Exchange

MusicWeb is a RPI student-run E-commerce website that provides two services: The company acts as a Manager/Agent/Recording Company for Musicians seeking representation in the tremendously cutthroat Entertainment business. The Musician interacts with the company electronically and transfers all music data via the companies internet website. Suppliers also exchange inventory information with MusicWeb via XML.

Of course we could provide a client-server implementation that enters this information directly into our databases, but we can imagine the effort required for users of this solution would be great, if large quantities of information transfer are required. Additionally, we would have to deal with the issues of user authentication and application security. With an XML implementation any artist or supplier can provide MusicWeb with appropriate material and in much more detail than a client-server solution would provide.

Figure 7 describes the process of retrieving XML document and processing for inclusion into the MusicWeb database.

Figure 7 - MusicWeb Information Exchange
Figure 7 - MusicWeb Information Exchange

PHP has built-in support for parsing XML documents in the Simple API for XML (SAX) [9, 10]. We will use the SAX parser to parse the XML documents retrieved from our suppliers and artists. Figure 3 in Studio II.2 describes the information that we capture from our suppliers, artists, songs and song files. The XML template corresponding to parts supply and music is described in Figures 8 and 9.

NOTE: PHP and Apache are required to process the script below. See Studio 2.1 to learn how to configure a web development environment.

<?xml version="1.0"?>
<supply>

<supplier>
<supplierID>Supplier ID</supplierID>
<name>Supplier Name</name>
<address>Supplier Address</address>

<city>Supplier City</city>
<state>Supplier State</state>
<zip>Supplier Zip</zip>
<telephone>Supplier Telephone</telephone>

<fax>Supplier Fax</fax>
<email>Supplier Email</email>
<website>Supplier Website</website>
<part>
<part_ID>Part ID</part_ID>
<part_name>Part Name</part_name>
<part_description>Part Description</part_description>
<part_quantity>Quantity</part_quantity>
</part>
<part>
<part_ID>Part ID</part_ID>
<part_name>Part Name</part_name>
<part_description>Part Description</part_description>
<part_quantity>Quantity</part_quantity>
</part>
</supplier>
</supply>

Figure 8 - supply.xml

<?xml version="1.0"?>

<music>
<artist>
<artistID>Artist ID</artistID>
<address>Artist Address</address>
<city>Artist City</city>

<state>Artist State</state>
<zip>Artist Zip</zip>
<telephone>Artist Telephone</telephone>
<fax>Artist Fax</fax>

<email>Artist Email</email>
<contractID>Contract ID</contractID>
<rating>Artist Rating</rating>
<song>

<songID>Song ID</songID>
<songTitle>Song Title</songTitle>
<songLength>Song Length</songLength>
<categoryID>Category ID</categoryID>

<originalArtist>Original Artist</originalArtist>
<songFile>
<songFileName>Song File Name</songFileName>
</songFile>
</song>

</artist>
</music>

Figure 9 - music.xml

Note that any number of parts can be added to the supply.xml file and any number of songs can be added to the music.xml file.

Once these files are sent to MusicWeb then we process the files. The script illustrated in Figure 10 is used to process the supply.xml file provided in Figure 8. Figure 9 illustrates another XML file that descibes the data obtained from an artist. The processing of this file is left as an exercise for the reader at the end of the tutorial.

The PHP script in Figure 10 has been commented for clarity; place the script in the same XML directory used before (or for the sake of organization, create a new directory below the XML directory) of the MusicWeb site and create the files described in Figure 8 and 9. We use the following supply.xml file, that differs from Figure 8 only in the content that is processed. Use any XML enabled browser (Internet Explorer 5 or greater or Netscape 6) to view the file.

<html>
<head>
<title>Supply</title>
</head>
<body>
<?php
// data file
$file = "supply.xml";
// initialize parser
$xml_parser = xml_parser_create();
// set callback functions
xml_set_element_handler($xml_parser,"startElement","endElement");
xml_set_character_data_handler($xml_parser, "characterData");
// open XML file
if (!($fp = fopen($file, "r")))
{
die("Cannot locate XML data file: $file");
}
// read and parse data
while ($data = fread($fp, 4096))
{
// error handler
if (!xml_parse($xml_parser, $data, feof($fp)))
{
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
// clean up
xml_parser_free($xml_parser);
function startElement($parser,$name) {
global $currentTag;
$currentTag = $name;
	switch($name) {
case "SUPPLY":
echo "<table border=1 cellspacing=1 cellpadding=2 width=100%>";
break;
case "SUPPLIER":
echo "<tr>";
break;
case "NAME":
echo "<td>Supplier Name:</td><td>";
break;
case "ADDRESS":
echo "<td>Supplier Address:</td><td>";
break;
case "CITY":
echo "<td>&nbsp;</td><td>";
break;
case "STATE":
echo ", ";
break;
case "ZIP":
echo "&nbsp;";
break;
case "TELEPHONE":
echo "<td>Supplier Tel. #:</td><td>";
break;
case "FAX":
echo "<td>Supplier Fax #:</td><td>";
break;
case "EMAIL":
echo "<td>Supplier Email:</td><td>";
break;
case "WEBSITE":
echo "<td>Supplier Website:</td><td>";
break;
case "PART":
echo "<tr>";
break;
case "PART_ID":
echo "<td colspan=2>Part ID: (";
break;
case "PART_NAME":
echo " ";
break;
case "PART_DESCRIPTION":
echo "Description: ";
break;
case "PART_QUANTITY":
echo "QTY: ";
break;
default:
break;
}
}
function endElement($parser,$name) {
global $currentTag;
$currentTag = $name;
	switch($name) {
case "SUPPLY":
echo "</table>";
break;
case "SUPPLIER":
echo "</tr>";
break;
case "NAME":
echo "</td></tr>";
break;
case "ADDRESS":
echo "</td></tr>";
break;
case "CITY":
echo "";
break;
case "STATE":
echo " ";
break;
case "ZIP":
echo "</td></tr>";
break;
case "TELEPHONE":
echo "</td></tr>";
break;
case "FAX":
echo "</td></tr>";
break;
case "EMAIL":
echo "</td></tr>";
break;
case "WEBSITE":
echo "</td></tr>";
break;
case "PART":
echo "</tr>";
break;
case "PART_ID":
echo ")";
break;
case "PART_NAME":
echo "<br>";
break;
case "PART_DESCRIPTION":
echo "<br>";
break;
case "PART_QUANTITY":
echo "</tr>";
break;
default:
break;
}
$currentTag = "";
}
function characterData($parser,$data) {
global $currentTag;
	switch($currentTag) {
	case "NAME":
echo $data;
break;
case "ADDRESS":
echo $data;
break;
case "CITY":
echo $data;
break;
case "STATE":
echo $data;
break;
case "ZIP":
echo $data;
break;
case "TELEPHONE":
echo $data;
break;
case "FAX":
echo $data;
break;
case "EMAIL":
echo $data;
break;
case "WEBSITE":
echo $data;
break;
case "PART_NAME":
echo $data;
break;
case "PART_ID":
echo $data;
break;
case "PART_DESCRIPTION":
echo $data;
break;
case "PART_QUANTITY":
echo $data;
break;
default:
break;
}
}
?>
<p>
<form name="form1" method="post" action="">
<input type="submit" name="Submit2" value="Approve Supplier">
<input type="submit" name="Submit" value="Trash">
<input type="submit" name="Submit" value="Save For Review">

</form>
</body>
</html>

Figure 10 - script view of supply.php - Copied with permission. Copyright 2002 Melonfire.

This specific script processes one file at a time, however a simple while loop could be written to process a series of XML documents contained within a directory, however this task is left to the reader.

Figure 11 displays the result of the supply.php script shown in Internet Explorer. Note that the supply.php script extracts all the necessary data for us and presents it in a readable format. Additionally, it would be trivial to write a PHP script to approve, delete or save for later review, the supplier and the parts information, as indicated by the form submit buttons on the lower end of the webpage.

Figure 11 - Supply.php as viewed in Internet Explorer

Figure 11 - Supply.php as viewed in Internet Explorer

Conclusion

We have provided an introduction to creating XML documents and presenting these documents using a variety of methods, CSS, XSL and illustrated a simple data transfer from an XML document to a SQL database. XML is a relatively new field that is experiencing tremendous growth and thus other resources should be accessed to gain a full understanding of the uses of XML. The references below provide a starting point but their is no limit to the resources that can be found on the Internet.


Deliverables

Part I

  1. Adapt the approval process shown in Figure 10 to an artist using the XML file described in Figure 9. This should include the song data and song file location information, but make any modifications you desire. Submit the PHP script that automates this process as well as the associated XML files. Indicate what improvements you have made to the process, if any.
  2. Identify two other XML-based technologies that would improve the processing of XML documents (as well as replace those), presented above - and discuss the changes that they would make. We have presented CSS and XSL, but there are far more exciting technologies available!

Part II (Optional)

  1. Investigate alternative methods of parsing XML files using PHP.
  2. Write a PHP script that enables the Approve/Trash/Save buttons for the forms in supply.php or that created in Part 1 #1.

References

  1. Ray, Eric T., Learning XML, O'Reilly & Associates, CA, 2001
  2. A Quick Introduction to XML <http://www.java.sun.com/xml/docs/tutorial/overview/1_xml.html> (Sun)
  3. Java Sun's The Java API for Xml Parsing (JAXP) Tutorial <http://www.java.sun.com/xml/docs/tutorial/index.html>
  4. IBM's XML tutorial <http://www-4.ibm.com/software/developer/education/xmlintro/xmlintro.html>
  5. What is XML? <http://www.geocities.com/SiliconValley/Peaks/5957/xml.html> (geocities)
  6. Oracle's Live XML Demo <http://technet.oracle.com/tech/xml/demo/demo1.htm>
  7. The XML Cover Pages <http://www.oasis-open.org/cover/sgml-xml.html>
  8. Designing for the Future with XML <http://hotwired.lycos.com/webmonkey/98/19/index0a.html?tw=authoring>
  9. Using PHP with XML (Part 1), <http://www.devshed.com/Server_Side/XML/XMLwithPHP/XMLwithPHP1>
  10. Using PHP with XML (Part 2), <http://www.devshed.com/Server_Side/XML/XMLwithPHP/XMLwithPHP2>
  11. Adobe Illustrator 10.0,<http://www.adobe.com>
 

viu.eng.rpi.edu is hosted by Professor Cheng Hsu.
Rensselaer Polytechnic Institute
Department of Industrial and Systems Engineering (formally Decision Sciences & Engineering Systems)
110 8th St., Center for Industrial Innovation, Room 5123, Troy, NY 12180-3590

Copyright © 1997-2016. MetaWorld, Nothing on this site may be commercially used without written consent.

Valid XHTML 1.0! Valid CSS!