Thanks to Heather Poag for the Catch Up logo.

Officially Inaugurating the Catsup Week
"These are the days of miracle and wonder..."


Lecture Notes Twenty-Eight: Content Syndication

Catsup Days are finally here.

frilled.cs.indiana.edu%webster inaugurate
in-au-gu-rate \in-'o[0xC7]-g(y)e-,ra^-t\ vt  -rat-ed; -rat-ing
[L inauguratus, pp. of inaugurare, lit., to practice augury, fr. in-
     + augurare to augur; fr. the rites connected with augury]
(1606)
1: to induct into an office with suitable ceremonies
2a: to dedicate ceremoniously: observe formally the beginning of
2b: to bring about the beginning of
syn see BEGIN 
-- in-au-gu-ra-tor \-,ra^-t-er\ n 


frilled.cs.indiana.edu%

So we bring about the beginning of this final week with suitable ceremonies, and celebration.

Main administrative announcements for today, and rest of the week are, as follows:

  1. For the benefit of Mr. Kite, there will be a show tonight on trampoline. The Hendersons will all be there, late of Pablo Fanque's Fair - what a scene over men and horses, hoops and garters, lastly through a hogshead of real fire, in this way Mr. K. will challenge the world.

  2. The celebrated Mr. K. performs his feat on Saturday at Bishopsgate. The Hendersons will dance and sing as Mr. Kite flys through the ring, don't be late! Messrs. K and H. assure the public their production will be 2nd to none. And of course Henry the Horse dances the waltz.

  3. The band begins at ten to six when Mr. K. performs his tricks without a sound. And Mr. H. will demonstrate ten somersets he'll undertake on solid ground. Having been some days in preparation a splendid time is guaranteed for all. And tonight Mr. Kite is topping the bill.

The title of tonight's show is

CONTENT SYNDICATION

Couldn't we really simply stop here?

No, we really need to get started: it's ten to six (when Mr. K performs his tricks). We can't be late.

Let's start with a portion of a file: interface.html

<html>
  <head>
    <title>FooBar Public Library: Add Books Interface</title>
    <style>
       <!-- 
         body { font-family: Arial }
         h1   { color: #000080 }
       -->
    </style> 
  </head>

  <body link="#FFFF00" vlink="#FFFF00" alink="#FFFF00">
    <table border="0" width="100%" cellpadding="0" cellspacing="0"> 
      <tr> 

        <td width="15%" bgcolor="#000080" valign="top" align="center"> </td> 

        <td width="*" valign="top" align="center"> 

          <h1 align="center"> The Foobar Public Library </h1>
          <h3 align="center"><i>- Add Books -</i></h3> 

           <form method="POST" action="/cgi-bin/foobar/addBook.pl">

             <p> 
             <input type="submit" value="Add This Book" name="AddBook">
             <input type="reset" value="Reset Form" name="reset"> 
             <input type="button" value="Cancel" name="cancel"> 
             </p> 

          </form> 

        </td> 

      </tr> 
    </table> 
  </body> 
</html>
This is the entry point located at
/u/username/apache/apache_1.3.20/htdocs/interface.html
Let's add all there is to it and finish it.

<html>
  <head>

    <title>FooBar Public Library: Add Books Interface</title>

    <style>
       <!-- 
         body { font-family: Arial }
         h1   { color: #000080 }
       -->
    </style> 

  </head>
  <body link="#FFFF00" vlink="#FFFF00" alink="#FFFF00">

    <table border="0" width="100%" cellpadding="0" cellspacing="0"> 

      <tr> 

        <td width="15%" bgcolor="#000080" valign="top" align="center"> 
          <b><i><font color="#FFFFFF" size="4">Options</font></i></b>
          <p><b><font color="#FFFFFF"><a href="mainMenu.html">Main Menu</a></font></b></p>
          <p><b><font color="#FFFFFF"><a href="/cgi-bin/foobar/catalog.pl">Catalog</a></font></b></p>
          <p><b><i><font color="#FFFF00">Add Books</font></i></b></p> 
          <p><b><font color="#FFFFFF"><a href="logout.html">Log Out</a></font></b></p> 
        </td> 

        <td width="*" valign="top" align="center"> 

           <h1 align="center"> The Foobar Public Library </h1>

           <h3 align="center"><i>- Add Books -</i></h3> 

           <form method="POST" action="/cgi-bin/foobar/addBook.pl">

             <table border="0" cellpadding="5" width="100%">
               <tr> 
                 <td width="100%" valign="top" align="center" colspan="2"> 
                   Title&nbsp;<input type="text" name="title" size="20">
                   <hr width="85%" />
                 </td> 
               </tr><tr> 
                 <td width="50%" valign="top" align="right"> 
                   Author&nbsp;<input type="text" name="author" size="20">
                 </td> <td width="50%" valign="top" align="left">
                   Subject&nbsp;<select size="1" name="subject">
                     <option>Fiction</option>
                     <option>Biography</option>
                     <option>Science</option>
                     <option>Industry</option>
                     <option>Computers</option>
                   </select>
                 </td> 
               </tr><tr> 
                 <td width="50%" valign="top" align="right">
                   Publisher&nbsp;<input type="text" name="publisher" size="20">
                 </td><td width="50%" valign="top" align="left">
                   ISBN&nbsp;<input type="text" name="isbn" size="20">
                 </td> 
               </tr><tr> 
                 <td width="50%" valign="top" align="right">
                   Price&nbsp;<input type="text" name="price" size="20">
                 </td><td width="50%" valign="top" align="left">
                   Pages&nbsp;<input type="text" name="numPages" size="20">
                 </td>
               </tr><tr> 
                 <td width="100%" valign="top" align="center" colspan="2">
                   Description&nbsp;<textarea rows="3" name="description" cols="45"></textarea>
                 </td> 
               </tr> 
             </table>

             <p> 
             <input type="submit" value="Add This Book" name="AddBook">  
             <input type="reset" value="Reset Form" name="reset">   
             <input type="button" value="Cancel" name="cancel"> 
             </p> 

           </form> 
 
        </td> 

      </tr> 

    </table> 

  </body> 
</html>
You can see it here. It calls an addBook.pl Perl script, so we better provide it, quickly.

#!/usr/bin/perl

use CGI; 

$query = new CGI; 

print $query->header, 
      $query->start_html,
      "Hello, how are you?<p>";

$title = $query->param('title');
$author = $query->param('author') ;
$subject = $query->param('subject');
$publisher = $query->param('publisher');
$isbn = $query->param('isbn');
$price = $query->param('price');
$pages = $query->param('pages');
$description = $query->param('description');

print qq{ 

  You seem to be entering the following book: 

  <dl>

  <dt>Title</dt> <dd> $title<p></dd> 
  <dt>Author</dt> <dd> $author<p></dd> 
  <dt>Subject</dt> <dd> $subject<p></dd> 
  <dt>Publisher</dt> <dd> $publisher<p></dd> 
  <dt>ISBN</dt> <dd> $isbn<p></dd> 
  <dt>Price</dt> <dd> $price<p></dd> 
  <dt>Pages</dt> <dd> $pages<p></dd> 
  <dt>Description</dt> <dd> $description <p></dd> 

  </dl>

}; 

print $query->end_html;
This was very easy. Let's now store the results in a file.

Let's decide on these things first:

I will place the file in
/u/username/foobar/data/dataFile.txt
The format will be very simple. Here's an example:
title:author:subject:publisher:isbn:price:numPages:description
Let's assume now (for the purpose of this exercise) that colon (:) is a character that we own exclusively. We have bought this character and nobody can use it, except us. So we use it, as we are now guaranteed to be the only ones to do so, as a delimiter. Total fabrication, but let's agree to it.

Here's an example of a real book:

Title
The Armchair Universe - An Exploration of Computer Worlds

Author
A. K. Dewdney

Subject
Science

Publisher
W. H. Freeman and Company, New York

ISBN
0-7167-1939-8

Price
19.90

Number of Pages
330

Description
This is the first collection of A.K.Dewdney's popular "Computer Recreations" columns, drawn from Scientific American magazine between 1984 and 1987. Inspired by Martin Gardner's classic "Mathematical Games" column, which entertained millions of readers for more than 30 years, "Computer Recreations" has quickly become one of the most widely read and anticipated columns in Scientific American. The computer recreations described here range from purely entertaining brainteasers to more practical computer applications of scientific thought. And with Dewdney's lucid programming directions to follow, you can actually sit at your computer and try your hand at them all. Available in paperback and hardcover. Cover image shows Julia set bounding three basins of attraction on a Riemann sphere.

The ISBN will be the key.

Which brings us to the last question.

The real model behind this file is a table in a RDBMS like MySQL.

By the way, you know (as I hope you remember) that readParse taught us how we can make characters our own, anyway, so the story about needing a guarantee for colon is a non-issue, really.

So let's get started.

#!/usr/bin/perl

use CGI; 

$query = new CGI; 

print $query->header, 
      $query->start_html,
      "Hello, and how are you doing? <p> ";

$title = $query->param('title');
$author = $query->param('author') ;
$subject = $query->param('subject');
$publisher = $query->param('publisher');
$isbn = $query->param('isbn');
$price = $query->param('price');
$numPages = $query->param('numPages');
$description = $query->param('description');

print qq{ 

  You seem to be entering the following book: 

  <dl>

  <dt>Title</dt> <dd> $title<p></dd> 
  <dt>Author</dt> <dd> $author<p></dd> 
  <dt>Subject</dt> <dd> $subject<p></dd> 
  <dt>Publisher</dt> <dd> $publisher<p></dd> 
  <dt>ISBN</dt> <dd> $isbn<p></dd> 
  <dt>Price</dt> <dd> $price<p></dd> 
  <dt>Number of Pages</dt> <dd> $numPages<p></dd> 
  <dt>Description</dt> <dd> $description <p></dd> 

  </dl>

}; 

%library = (); 

open (AB, "/u/dgerman/foobar/data/dataFile.txt"); 
@x = <AB>;
close(AB); 

foreach $line (@x) {
    @line = split(/:/, $line); 
    $key = $line[4]; 
    $library{$key} = $line; 
}

#add new book

$library{$isbn} = "$title:$author:$subject:$publisher:$isbn:$price:$numPages:$description"; 
$newline = chr(13); 
$library{$isbn} =~ s/[\n\r$newline]/ /g; 
$library{$isbn} =~ s/\s/ /g; 
open (AB, ">/u/dgerman/foobar/data/dataFile.txt"); 
foreach $key (sort (keys %library)) {
    print AB $library{$key}, "\n"; 
}
close(AB); 

print "The book has been added, thank you. ", $query->end_html; 
Let's add two books, then use this script to see them. Here's the second book:
Hello, and how are you doing?

You seem to be entering the following book:

Title
In Search of Lake Wobegon

Author
Garrison Keillor, Richard Olsenius (Photographer)

Subject
Biography

Publisher
Viking Press

ISBN
0-6700-3037-6

Price
29.95

Number of Pages
128

Description
In the twenty-five years since Garrison Keillor first brought it to life, the rural Minnesota town of Lake Wobegon has become a national treasure. In this lavishly produced photography book, word and image combine to illuminate the real Minnesota town-life, landscapes, and people who inspired its creation. Taking us on a tour of Stearns County, the Minnesota county he deems most "Wobegonic," Keillor meditates on the origins of the place where, as a young writer, he found the inspiration for his fiction and his radio show. As an artful evocation of Keillor's beloved invention, Richard Olsenius's elegantly composed black-and-white photographs of rural Minnesota capture the dignity of his subjects, the beauties of the landscape as well as the enduring values and eccentricities of the communities rooted there.

The book has been added, thank you.
There are a few issues that we won't even consider, such as: Our focus, once again, is content syndication.

And now the story.

Long, interesting, involving story about Best Book Buys comes here and touches the audience.

To summarize here are the facts:

  1. Company A (FooBar) is storing books. Data Entry is HTML with Perl, as we have seen.
  2. Company B (broker) needs access to data of Company A.
  3. Company B does servlets, mostly. Company A does only Perl.

How do they interface?

Let's review servlets briefly, just so we know what we can count on.

We have this servlet, One:

import javax.servlet.*;
import javax.servlet.http.*; 
import java.io.*; 

public class One extends HttpServlet {
  
  public void doGet(HttpServletRequest one, 
                    HttpServletResponse two) 
              throws ServletException, IOException 
  {

     two.setContentType("text/html"); 

     PrintWriter out = two.getWriter(); 

     out.println("How are you?"); 

  }

} 
We have this servlet, Two:
import javax.servlet.*;
import javax.servlet.http.*; 
import java.io.*; 

public class Two extends HttpServlet {
  
  public void doGet(HttpServletRequest one, 
                    HttpServletResponse two) 
              throws ServletException, IOException 
  {

     String p = one.getParameter("nom");

     two.setContentType("text/html"); 
     PrintWriter out = two.getWriter(); 
     out.println("How are you, " + p + "?"); 

  }

} 
We understand them well, we reviewed them just now.

So now let's look at the third one. (Recall the chat applet, if you will, it could be handy.)

import java.io.*;
import java.net.*;
import javax.servlet.*;
import javax.servlet.http.*; 

public class Three extends HttpServlet {
  
    public void doGet(HttpServletRequest request, 
                      HttpServletResponse response) 
        throws IOException, ServletException {

        response.setContentType("text/html"); 

        PrintWriter out = response.getWriter(); 

        URL getBooksURL = new URL("http",
                                  "burrowww.cs.indiana.edu", 
                                  31090,
                                  "/cgi-bin/foobar/catalog.pl");


        URL url           = new URL(getBooksURL.toExternalForm()); 

        URLConnection con = url.openConnection(); 

        con.setUseCaches(false); 

        InputStream in       = con.getInputStream(); 
        DataInputStream data = new DataInputStream(new BufferedInputStream(in)); 

        String line = data.readLine(); 

        while (line != null) {

            out.println(line); 
            line = data.readLine(); 

        }

        out.println("<p> This is the end of the servlet"); 

    }

} 
You can try it here.

  1. But how difficult can parsing be in this case?

  2. And what if they change the format of their reports?

Company A now produces output for humans.

How do we become more systematic, so a computer program can figure it out for us? How do make it such that changes in the structure of data can be effectively absorbed by the program that reads the data? And, is that possible? The answer is: yes, that's what XML was invented for.

So we switch to XML. First we need a report script that transforms the database in XML.

But wait. Let's see some examples before we go too far, so we know what we're talking about.

Important Note:

You will need

/u/dgerman/public/xerces.jar
This contains an archive of classes used for parsing.

The parser (what's that?) we use is called Xerces.

Were does the name come from?

The Apache Xerces parser is called after the now extinct Xerces Blue Butterfly. Wiped out by urban expansion, the last known specimens were taken in 1941 at the Presidio military base in San Francisco. The butterfly was named after a king. A French entomologist named the butterfly for the Persian King Xerxes, but with the French spelling "Xerces," which was retained.

King Xerxes, son of Darius, ascended to the throne of Persia after his father's death in 486 BC. By 480 BC, the army he assembled had approximately 100,000 to 180,000 men and a fleet of nearly 600 ships, quite a large army by Greek standards and he decided to invade Greece. The plan was for his massive army to cross the Hellespont, and march around the Aegean sea and conquer Greece by land.

Crossing the Hellespont proved to be troublesome to Xerxes and his army. They tried to cross the Hellespont with a bridge of boats, but alas, the sea became rough and the bridge broke apart. When King Xerxes heard of this, he was furious, and gave orders that the sea should receive 300 lashes with whips. The sea did calm down and the second attempt to build a bridge was successful.

This, however, happened a long, long time ago.

Place the xerces.jar file in $TOMCAT_HOME/lib.

Re-start your tomcat.

I use the following for starting and stopping Tomcat:

setenv startTomcat $TOMCAT_HOME/bin/startup.sh
setenv stopTomcat  $TOMCAT_HOME/bin/shutdown.sh
Also make sure your CLASSPATH variable points to the xerces.jar file.

setenv CLASSPATH $TOMCAT_HOME/lib/xerces.jar:$CLASSPATH
And now let's look at some XML examples.

Script started on Tue Dec 04 09:18:50 2001
burrowww.cs.indiana.edu% pwd
/nfs/paca/home/user1/dgerman/foobar
burrowww.cs.indiana.edu% ls -l
total 2
drwxr-xr-x   2 dgerman  faculty       512 Dec  3 02:25 data
drwxr-xr-x   2 dgerman  faculty       512 Dec  4 09:18 parser
burrowww.cs.indiana.edu% ls data
dataFile.txt
burrowww.cs.indiana.edu% cd parser
burrowww.cs.indiana.edu% ls -l
total 7
-rw-r--r--   1 dgerman  faculty       515 Dec  4 07:25 FirstParser.java
-rw-r--r--   1 dgerman  faculty      3693 Dec  4 09:18 IndentingParser.java
-rw-r--r--   1 dgerman  faculty       681 Dec  4 09:17 customer.xml
-rw-r--r--   1 dgerman  faculty       184 Dec  4 09:17 greeting.xml
burrowww.cs.indiana.edu% cat customer.xml
<?xml version="1.0"?>
<document>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
    <item> Four  </item> 
  </customer>
  <customer> 
    <item> One   </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
    <item> Four  </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
    <item> Four  </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
  </customer>
</document>
burrowww.cs.indiana.edu% cat greeting.xml
<?xml version="1.0" encoding="UTF-8"?>
<document>
  <greeting>
    Hello from XML 
  </greeting>
  <message>
    Welcome to the wild and woolly world of XML. 
  </message>
</document>
These were the two files, we will discuss them, not let's compile the two programs.

burrowww.cs.indiana.edu% javac *.java
burrowww.cs.indiana.edu% ls -l
total 13
-rw-r--r--   1 dgerman  faculty      1300 Dec  4 09:19 FirstParser.class
-rw-r--r--   1 dgerman  faculty       515 Dec  4 07:25 FirstParser.java
-rw-r--r--   1 dgerman  faculty      3369 Dec  4 09:19 IndentingParser.class
-rw-r--r--   1 dgerman  faculty      3693 Dec  4 09:18 IndentingParser.java
-rw-r--r--   1 dgerman  faculty       681 Dec  4 09:17 customer.xml
-rw-r--r--   1 dgerman  faculty       184 Dec  4 09:17 greeting.xml
burrowww.cs.indiana.edu% java FirstParser greeting.xml
greeting.xml has 0 <customer> elements. 
burrowww.cs.indiana.edu% java FirstParser customer.xml
customer.xml has 6 <customer> elements. 
burrowww.cs.indiana.edu% java IndentingParser customer.xml
 <?xml version="1.0" encoding="UTF-8"?>
<document>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
        <item>
            Four
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
        <item>
            Four
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
        <item>
            Four
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
    </customer>
</document>
The second one is an indenting parser. A pretty-printer. It reads, understands and displays a carefully formatted version of the input. The emphasis here is on understands as it builds an internal data structure, a model of what it reads: a tree, following the language's syntactic structure. The formatted output is only a side-effect, for all we care right now.

But note one thing: the source (the input) could come from over the network.

burrowww.cs.indiana.edu% java IndentingParser http://burrowww.cs.indiana.edu:31090/cgi-bin/foobar/two
 <?xml version="1.0" encoding="UTF-8"?>
<document>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
        <item>
            Four
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
        <item>
            Four
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
        <item>
            Three
        </item>
        <item>
            Four
        </item>
    </customer>
    <customer>
        <item>
            One
        </item>
        <item>
            Two
        </item>
    </customer>
</document>
See the script two presented below for conformity.

Now let's look at the source code.

burrowww.cs.indiana.edu% cat IndentingParser.java
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;

public class IndentingParser {
    static String displayStrings[] = new String[1000];
    static int numberDisplayLines = 0; 
    public static void displayDocument(String uri) {
        try {
            DOMParser parser = new DOMParser(); 
            parser.parse(uri); 
            Document document = parser.getDocument(); 
            display(document, " "); 
        } catch (Exception e) {
            e.printStackTrace(System.err); 
        }
    }
    public static void display(Node node, String indent) {
        if (node == null) { return; } 
        int type = node.getNodeType(); 
        switch (type) {
            case Node.DOCUMENT_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] += 
                    "<?xml version=\"1.0\" encoding=\"" + "UTF-8" + "\"?>"; 
                numberDisplayLines++;
                display(((Document)node).getDocumentElement(), ""); 
                break; 
            }
            case Node.ELEMENT_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] += "<"; 
                displayStrings[numberDisplayLines] += node.getNodeName(); 
                int length = (node.getAttributes() != null) ?
                    node.getAttributes().getLength() : 0; 
                Attr attributes[] = new Attr[length];
                for (int loopIndex = 0; loopIndex < length; loopIndex++) {
                    attributes[loopIndex] = 
                        (Attr)node.getAttributes().item(loopIndex);
                }
                for (int loopIndex = 0; loopIndex < attributes.length; loopIndex++) {
                    Attr attribute = attributes[loopIndex];
                    displayStrings[numberDisplayLines] += " "; 
                    displayStrings[numberDisplayLines] += attribute.getNodeName(); 
                    displayStrings[numberDisplayLines] += "=\""; 
                    displayStrings[numberDisplayLines] += attribute.getNodeValue(); 
                    displayStrings[numberDisplayLines] += "\""; 
                }
                displayStrings[numberDisplayLines] += ">"; 
                numberDisplayLines++; 
                NodeList childNodes = node.getChildNodes(); 
                if (childNodes != null) {
                    length = childNodes.getLength(); 
                    indent += "    "; 
                    for (int loopIndex = 0; loopIndex < length; loopIndex++) {
                        display(childNodes.item(loopIndex), indent); 
                    }
                }
                break; 
            }
            case Node.CDATA_SECTION_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] = "<![CDATA["; 
                displayStrings[numberDisplayLines] += node.getNodeValue(); 
                displayStrings[numberDisplayLines] += "]]>"; 
                numberDisplayLines++; 
                break; 
            }
            case Node.TEXT_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                String newText = node.getNodeValue().trim(); 
                if (newText.indexOf("\n") < 0 && newText.length() > 0) {
                    displayStrings[numberDisplayLines] += newText; 
                    numberDisplayLines++; 
                }
                break; 
            }
            case Node.PROCESSING_INSTRUCTION_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] += "<?"; 
                displayStrings[numberDisplayLines] += node.getNodeName(); 
                String text = node.getNodeValue(); 
                if (text != null && text.length() > 0) {
                    displayStrings[numberDisplayLines] += text; 
                }
                displayStrings[numberDisplayLines] += "?>"; 
                numberDisplayLines++;
                break;
            }
        } /* switch */ 
        if (type == Node.ELEMENT_NODE) {
            displayStrings[numberDisplayLines] = 
                indent.substring(0, indent.length() - 4);
            displayStrings[numberDisplayLines] += "</";
            displayStrings[numberDisplayLines] += node.getNodeName(); 
            displayStrings[numberDisplayLines] += ">"; 
            numberDisplayLines++; 
            indent+= "    "; 
        }
    }

    public static void main(String[] args) {
        displayDocument(args[0]);
        for (int loopIndex = 0; loopIndex < numberDisplayLines; loopIndex++) {
            System.out.println(displayStrings[loopIndex]);
        }
    }

}
Second program first, here now is the first program.

burrowww.cs.indiana.edu% cat FirstParser.java
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;
public class FirstParser {
    public static void main(String[] args) {
        try {
            DOMParser parser = new DOMParser(); 
            parser.parse(args[0]); 
            Document doc = parser.getDocument(); 
            NodeList nodeList = doc.getElementsByTagName("customer"); 
            System.out.println(args[0] + " has " + 
                               nodeList.getLength() + 
                               " <customer> elements. "); 
        } catch (Exception e) {
            e.printStackTrace(System.err); 
        } 
    } 
}
burrowww.cs.indiana.edu% exit
burrowww.cs.indiana.edu% 
script done on Tue Dec 04 09:20:52 2001
And that's the end of the examples script.

What you need to see now is two:

burrowww.cs.indiana.edu% pwd
/nfs/paca/home/user1/dgerman/apache/apache_1.3.20/cgi-bin/foobar
burrowww.cs.indiana.edu% ls -ld two
-rwx------   1 dgerman  faculty       751 Dec  4 07:23 two
burrowww.cs.indiana.edu% cat two
#!/usr/bin/perl

print "Content-type: plain/text\n\n"; 

print qq{<?xml version="1.0"?>
<document>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
    <item> Four  </item> 
  </customer>
  <customer> 
    <item> One   </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
    <item> Four  </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
    <item> Three </item> 
    <item> Four  </item> 
  </customer>
  <customer> 
    <item> One   </item> 
    <item> Two   </item> 
  </customer>
</document>}; 

burrowww.cs.indiana.edu% 
Let's now return to Company A (FooBar).

After much thought they produced the following output script.

Script started on Tue Dec 04 09:42:17 2001
burrowww.cs.indiana.edu% pwd
/nfs/paca/home/user1/dgerman/apache/apache_1.3.20/cgi-bin/foobar
burrowww.cs.indiana.edu% ls -ld DOM*.pl
-rwxr-xr-x   1 dgerman  faculty       900 Dec  4 07:47 DOMreport.pl
burrowww.cs.indiana.edu% cat DOMreport.pl
#!/usr/bin/perl

print "Content-type: plain/text\n\n"; 

open (AB, "/u/dgerman/foobar/data/dataFile.txt"); 
@x = <AB>;
close(AB); 

print "<?xml version=\"1.0\"?>\n"; 

print "\n<document>\n"; 

%library = (); 

foreach $line (@x) {

    @line = split(/:/, $line); 
    $isbn = $line[4]; 

    $title       = $line[0]; 
    $author      = $line[1]; 
    $subject     = $line[2]; 
    $publisher   = $line[3]; 
    $isbn        = $line[4]; 
    $price       = $line[5]; 
    $numPages    = $line[6]; 
    $description = $line[7]; 
    
    print qq{
  <book> 

    <subject>$subject</subject>

    <title>$title</title>
    <author>$author</author>
    <publisher>$publisher</publisher>

    <numPages>$numPages</numPages>

    <saleDetails>
      <isbn>$isbn</isbn>
      <price>$price</price>
    </saleDetails>

    <description>$description</description>

  </book>};  
}

print "</document>"; 
Let's run it, so we can better see what's going on.

burrowww.cs.indiana.edu% ./DOMreport.pl
Content-type: plain/text

<?xml version="1.0"?>

<document>

  <book> 

    <subject>Biography</subject>

    <title>In Search of Lake Wobegon</title>
    <author>Garrison Keillor, Richard Olsenius (Photographer)</author>
    <publisher>Viking Press</publisher>

    <numPages>128</numPages>

    <saleDetails>
      <isbn>0-6700-3037-6</isbn>
      <price>29.95</price>
    </saleDetails>

    <description>In the twenty-five years since Garrison Keillor first brought it to life, the rural                Minnesota town of Lake Wobegon has become a national treasure. In this                lavishly produced photography book, word and image combine to illuminate                the real Minnesota town-life, landscapes, and people who inspired its                creation. Taking us on a tour of Stearns County, the Minnesota county he                deems most "Wobegonic," Keillor meditates on the origins of the place                where, as a young writer, he found the inspiration for his fiction and his radio                show. As an artful evocation of Keillor's beloved invention, Richard                Olsenius's elegantly composed black-and-white photographs of rural                Minnesota capture the dignity of his subjects, the beauties of the landscape                as well as the enduring values and eccentricities of the communities rooted                there. 
</description>

  </book>
  <book> 

    <subject>Science</subject>

    <title>The Armchair Universe - An Exploration of Computer Worlds  </title>
    <author>A. K. Dewdney </author>
    <publisher>W. H. Freeman and Company, New York </publisher>

    <numPages>330</numPages>

    <saleDetails>
      <isbn>0-7167-1939-8</isbn>
      <price>19.90</price>
    </saleDetails>

    <description>This is the first collection of A.K.Dewdney's popular "Computer Recreations"                columns, drawn from Scientific American magazine between 1984 and 1987.                Inspired by Martin Gardner's classic "Mathematical Games" column, which                entertained millions of readers for more than 30 years, "Computer                Recreations" has quickly become one of the most widely read and anticipated                columns in Scientific American. The computer recreations described here                range from purely entertaining brainteasers to more practical computer                applications of scientific thought. And with Dewdney's lucid programming                directions to follow, you can actually sit at your computer and try your hand at                them all. Available in paperback and hardcover. Cover image shows Julia set                bounding three basins of attraction on a Riemann sphere.   
</description>

  </book></document>burrowww.cs.indiana.edu% exit
burrowww.cs.indiana.edu% 
script done on Tue Dec 04 09:42:43 2001

(Sometimes raw data can be in just the right format.)

Company B can now easily interact with Company A.

Script started on Tue Dec 04 09:56:51 2001
burrowww.cs.indiana.edu% pwd
/nfs/paca/home/user1/dgerman/apache/jakarta-tomcat-3.2.3/webapps/examples/WEB-INF/classes
burrowww.cs.indiana.edu% echo $myServlets
/u/dgerman/apache/jakarta-tomcat-3.2.3/webapps/examples/WEB-INF/classes
burrowww.cs.indiana.edu% ls -ld Nine.java Ten.java
-rw-r--r--   1 dgerman  faculty       870 Dec  4 07:38 Nine.java
-rw-r--r--   1 dgerman  faculty       746 Dec  4 07:57 Ten.java
burrowww.cs.indiana.edu% cat Nine.java
import java.io.*;
import java.net.*;
import javax.servlet.*;
import javax.servlet.http.*; 

import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;

public class Nine extends HttpServlet {
    
    public void doGet(HttpServletRequest request, 
                      HttpServletResponse response) 
        throws IOException, ServletException {
        
        response.setContentType("text/html"); 
        PrintWriter out = response.getWriter(); 
        
        try {
            String url = "http://burrowww.cs.indiana.edu:31090/cgi-bin/foobar/two";
            DOMParser parser = new DOMParser(); 
            parser.parse(url); 
            Document doc = parser.getDocument(); 
            NodeList nodeList = doc.getElementsByTagName("customer"); 
            out.println(url + " has " + 
                               nodeList.getLength() + 
                               " <customer> elements. "); 
        } catch (Exception e) {
            e.printStackTrace(System.err); 
        } 
                
    }
    
}
This shows how you can do simple parsing inside a servlet.

The next servlet simply uses an external class (that we have already seen).

burrowww.cs.indiana.edu% cat Ten.java
import java.io.*;
import java.net.*;
import javax.servlet.*;
import javax.servlet.http.*; 

public class Ten extends HttpServlet {
    
    public void doGet(HttpServletRequest request, 
                      HttpServletResponse response) 
        throws IOException, ServletException {
        
        response.setContentType("text/html"); 
        PrintWriter out = response.getWriter(); 

        String url = "http://burrowww.cs.indiana.edu:31090/cgi-bin/foobar/DOMreport.pl"; 
        
        IndentingParser.displayDocument(url);
        for (int loopIndex = 0; loopIndex < IndentingParser.numberDisplayLines; loopIndex++) {
            out.println(IndentingParser.displayStrings[loopIndex]);
        }
                
    }
    
}
Here's the IndentingParser.java class again.

burrowww.cs.indiana.edu% cat IndentingParser.java
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;

public class IndentingParser {
    static String displayStrings[] = new String[1000];
    static int numberDisplayLines = 0; 
    public static void displayDocument(String uri) {
        try {
            DOMParser parser = new DOMParser(); 
            parser.parse(uri); 
            Document document = parser.getDocument(); 
            display(document, " "); 
        } catch (Exception e) {
            e.printStackTrace(System.err); 
        }
    }
    public static void display(Node node, String indent) {
        if (node == null) { return; } 
        int type = node.getNodeType(); 
        switch (type) {
            case Node.DOCUMENT_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] += 
                    "<?xml version=\"1.0\" encoding=\"" + "UTF-8" + "\"?>"; 
                numberDisplayLines++;
                display(((Document)node).getDocumentElement(), ""); 
                break; 
            }
            case Node.ELEMENT_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] += "<"; 
                displayStrings[numberDisplayLines] += node.getNodeName(); 
                int length = (node.getAttributes() != null) ?
                    node.getAttributes().getLength() : 0; 
                Attr attributes[] = new Attr[length];
                for (int loopIndex = 0; loopIndex < length; loopIndex++) {
                    attributes[loopIndex] = 
                        (Attr)node.getAttributes().item(loopIndex);
                }
                for (int loopIndex = 0; loopIndex < attributes.length; loopIndex++) {
                    Attr attribute = attributes[loopIndex];
                    displayStrings[numberDisplayLines] += " "; 
                    displayStrings[numberDisplayLines] += attribute.getNodeName(); 
                    displayStrings[numberDisplayLines] += "=\""; 
                    displayStrings[numberDisplayLines] += attribute.getNodeValue(); 
                    displayStrings[numberDisplayLines] += "\""; 
                }
                displayStrings[numberDisplayLines] += ">"; 
                numberDisplayLines++; 
                NodeList childNodes = node.getChildNodes(); 
                if (childNodes != null) {
                    length = childNodes.getLength(); 
                    indent += "    "; 
                    for (int loopIndex = 0; loopIndex < length; loopIndex++) {
                        display(childNodes.item(loopIndex), indent); 
                    }
                }
                break; 
            }
            case Node.CDATA_SECTION_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] = "<![CDATA["; 
                displayStrings[numberDisplayLines] += node.getNodeValue(); 
                displayStrings[numberDisplayLines] += "]]>"; 
                numberDisplayLines++; 
                break; 
            }
            case Node.TEXT_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                String newText = node.getNodeValue().trim(); 
                if (newText.indexOf("\n") < 0 && newText.length() > 0) {
                    displayStrings[numberDisplayLines] += newText; 
                    numberDisplayLines++; 
                }
                break; 
            }
            case Node.PROCESSING_INSTRUCTION_NODE: {
                displayStrings[numberDisplayLines] = indent; 
                displayStrings[numberDisplayLines] += "<?"; 
                displayStrings[numberDisplayLines] += node.getNodeName(); 
                String text = node.getNodeValue(); 
                if (text != null && text.length() > 0) {
                    displayStrings[numberDisplayLines] += text; 
                }
                displayStrings[numberDisplayLines] += "?>"; 
                numberDisplayLines++;
                break;
            }
        } /* switch */ 
        if (type == Node.ELEMENT_NODE) {
            displayStrings[numberDisplayLines] = 
                indent.substring(0, indent.length() - 4);
            displayStrings[numberDisplayLines] += "</";
            displayStrings[numberDisplayLines] += node.getNodeName(); 
            displayStrings[numberDisplayLines] += ">"; 
            numberDisplayLines++; 
            indent+= "    "; 
        }
    }

    public static void main(String[] args) {
        displayDocument(args[0]);
        for (int loopIndex = 0; loopIndex < numberDisplayLines; loopIndex++) {
            System.out.println(displayStrings[loopIndex]);
        }
    }

}
burrowww.cs.indiana.edu% ls -ld Indent*.java
-rw-r--r--   1 dgerman  faculty      3693 Dec  4 07:52 IndentingParser.java
burrowww.cs.indiana.edu% javac Nine.java
burrowww.cs.indiana.edu% javac Ten.java
burrowww.cs.indiana.edu% 
script done on Tue Dec 04 09:57:55 2001
And we compile and run them.

Try the servlet Nine.

Try Ten. Is it working? Why or why not.

End of lecture.

Tomorrow in lab you will look at this again, so you can understand it better. The exam will be available on Wednesday night. It will ask you to produce the servlet and JSP implementations of Homework 2 and 3 (and 5,) that is, the calculator and portfolio. And that would be it.

What we discussed above becomes your Project Exam. If you have a project and are done with it, turn in a paper (two pages at most) that describes it. Include in the paper all the URLs that I should check on your web sites. If you do not have a project, and are waiting for the notes for the default project, read these notes above, understand them, take the lab tomorrow, and answer the questions that will be asked and posted on-line.

Turn your final and your project paper together, in person, before or on Dec 14 (5-7pm in LH102). There will be a set of lab notes posted tomorrow, and the text of the final will be posted where lecture notes for Thursday would have been posted normally. I hope you enjoyed these notes, and the class.

Please let me know if you need any help.


Last updated: Dec 3, 2001 by Adrian German for A348/A548