Lecture notes for Wednesday, 7/23

An Experiment

We tried something a little different at the beginning of class. I asked you to answer a question not by looking up the answer, but by creating a program and looking to see what it did. The question was this:

"What is the default directory for files created by a Java servlet running under Tomcat?"

On Monday, you created a Java program that created a file, and unless you told exactly what directory to put it in, it created a file in the same directory as the Java .class file that called it. Yesterday, you installed Apache Tomcat, which runs Java servlets. So if we were to write a Java servlet that created a file and we didn't tell it which directory to put the file into, it's not at all obvious where that file would end up. Would it be in the base directory for Tomcat ($CATALINA_HOME/)? Or would it be in the same directory as the .class file that contains the servlet we wrote (something like $CATALINA_HOME/webapps/test/WEB-INF/classes/)? Or the same directory as the .jar file that manages the servlets ($CATALINA_HOME/lib/)? Or the directory that contains the Tomcat startup scripts ($CATALINA_HOME/bin/)?

I had a couple things in mind when I came up with this task. The first was that the answer had surprised me when I did it myself. More importantly, I think the techniques I used to answer the question are important skills for a programmer to have. If you don't immediately know where to find the answer to a question, often a good way to answer it is to test it yourself. And lastly, I felt like we all needed a bit more practice working with Java servlets, files, and exceptions.

Now any Java servlet that creates a file would be fine for this task. You could've taken one of your old servlet programs and just added a line or two that created a file. Or you could've taken one of your old stand-alone programs that created a file and put that code into a servlet. Or you could just create a super-minimal servlet from scratch that created a file. I'll take the last approach here in the notes.

If you weren't in class, I encourage you to try out this exercise on your own before reading the part of the notes where I explain one way that you could take to answer the question.

I'll create a new web app named FileCreator and put it in the context /filetest. So I'll create the necessary directories. Since directories are an important part of this question, let's take a look at what the directory structure of my home directory is. (Well, I'll show you the parts that are relevant to this experiment. And any files that might be important or familiar.)

~
- apache-tomcat-7.0.54
  - bin
    - bootstrap.jar
    - catalina.sh
    - commons-daemon.jar
    - daemon.sh
    - setclasspath.sh
    - shutdown.sh
    - startup.sh
    - tomcat-juli.jar
  - lib
    - catalina.jar
    - servlet-api.jar
    - tomcat-api.jar
  - logs
  - temp
  - work
  - conf
    - server.xml
    - tomcat-users.xml
  - webapps
    - docs
    - examples
    - host-manager
    - manager
    - ROOT
    - filetest
      - WEB-INF
        
        classes
        
        FileCreator.class
        
        FileCreator.java
        
        lib
        
        web.xml
- jdk1.8.0_11
  - bin
    - java
    - javac
  - db
  - include
  - jre
  - lib
  - man

Remember that $CATALINA_HOME is just shorthand for the base Tomcat directory. For me, that's ~/apache-tomcat-7.0.54. Here's the web.xml file I created in the directory $CATALINA_HOME/webapps/filetest/:

<?xml version="1.0" encoding="ISO-8859-1"?>

<web-app>

 <servlet>
  <servlet-name>File Creation Test</servlet-name>
  <servlet-class>FileCreator</servlet-class>
 </servlet>


 <servlet-mapping>
  <servlet-name>File Creation Test</servlet-name>
  <url-pattern>/create</url-pattern>
 </servlet-mapping>


</web-app>

Just a review about what this means. The web app's context will be the directory /filetest Internally, the servlet will be called "File Creation Test". There needs to be a servlet class named FileCreator. And the servlet will be called when someone accesses the following URL:

http://silo.soic.indiana.edu:61303/filetest/create.

And here's the content of the test servlet I wrote:

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class FileCreator extends HttpServlet {
  public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {

    PrintWriter filewriter = new PrintWriter("funnyname.txt");
    filewriter.close();

  }
}

I used the PrintWriter constructor to create a file named "funnyname.txt" and open it for output. I gave it an odd name, so it'd be easier to find. I also closed the stream because it's the polite thing to do. After I saved and compiled the class, I went to the manager, reloaded application in the /filetest context, and then called up the website. Nothing appeared to happen, which is good because we didn't tell the program to display anything.

Now where is the file? I guess we could poke around in the directories until we found it, but there's an easier way. Unix has a command called find which can be used to find files. So I went up to my user directory and tried to find a file with the name "funnyname.txt".

[ewennstr@silo classes]$ cd ~
[ewennstr@silo ~]$ find -name funnyname.txt
./apache-tomcat-7.0.54/bin/funnyname.txt

Lo and behold, the file is in Tomcat's bin directory! Now this is interesting because when I ran a similar program on the previous day, the file ended up in Tomcat's conf directory! And while some of the students in class reported that their file showed up in Tomcat's bin directory, other students reported that it showed up in their home directory, not inside the Tomcat directories at all.

There is a lesson to be learned her beyond the practice we got, and that's to be careful about relying upon default settings, especially if you learned about the default setting from experience (as opposed to reading the manual). It's not always obvious what conditions might change those default settings. I know that if I decide to do any file input or output from a Tomcat served servlet in the future, I will always specify a complete path.

Accessing Websites Via HTTP

So far, we've written several different kinds of programs that generate web pages: CGI/Python scripts, CGI/PHP scripts, and Java servlets. But their interaction with the web has been fairly limited. The user's computer sends an HTTP request to our server (either Apache or Apache Tomcat), the server runs our program (giving it access to the environment variables that were part of the HTTP request), our program creates a web page, and the server sends that web page back to the user's computer. The only way our programs have been able to get information from the internet is by looking at the environment variables that were passed to it as part of the HTTP request. Today, we're going to be more proactive. We'll create a program that makes its own HTTP requests and gets back information from other websites.

Our program will access a website on the internet using the hypertext transfer protocol (otherwise known as HTTP), get the code for the web page line by line, and then output that code directly to our browsers. Since the code is written in HTML, and we are viewing the results in a browser, we'll end up seeing (a stripped-down* version of) the website in our browsers.

Why stripped-down? Well because we're only copying over the contents of the main HTML file. If there are any relative references in the HTML, then those references won't make sense any more, so we'll end up with broken links, missing images, and missing style sheets. So a link that's written as <a href="http://www.google.com/settings"> Privacy </a> will work, but if it's written as <a href="/settings"> Privacy </a>, it won't. And if the style sheet link is broken, it'll be missing everything that describes how the page should look, and all we'll get is the content.

When it comes to packages, we'll need to have the usual stuff for servlets (javax.servlet) and HTTP servlets (javax.servlet.http), as well as the input/output tools (java.io). And we'll also need tools for using URL's and accessing the internet (java.net)

import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.net.*;

Getting the content of a web page is not much different than getting the content of a local text file. We need to create some sort of input stream that will connect to the website and send the content to us. And just as with files, there are a number of different ways of doing this. We'll use the exact same method as we did for accessing files; we'll create a BufferedReader. The last time, we used a BufferedReader constructor that took as an argument an object of class FileReader. Karteek and I used different FileReader constructors, but once we had the FileReader, we used the same constructor for the BufferedReader. In particular, Karteek made his FileReader by creating an object of class File and then using that to build his FileReader. In the notes, I used a constructor for FileReader that just takes a string that contains the file name.

Today, we'll use a BufferedReader constructor that calls for an object of class inputStreamReader, but to create the inputStreamReader object, we first need to have an object of type inputStream. Fortunately, the URL class has a method called openStream() that returns an inputStream.

So we start by creating a URL object to store the address to the website we'll be copying, and to provide us with access to all the glorious little methods that URL objects have. For this simple test program, we'll just hard-code in a particular URL.

URL address = new URL("http://www.cs.indiana.edu/classes/a202");

We can use this URL object to open an InputStream with the command address.openStream(). Since the only thing we're going to do with this InputStream is use it to create an InputStreamReader, there's no need to give it a name. We can just send the output of address.openStream() straight to the InputStreamReader constructor: new InputStreamReader( address.openStream() ). Of course, the only thing we need the InputStreamReader for is to create a BufferedReader, so we can immediately feed this new InputStreamReader object to the constructor for a BufferedReader, which we will give a name to:

BufferedReader r = new BufferedReader(
  new InputStreamReader( address.openStream() ) 
);

The rest of the code is fairly self-explanatory. There is one big difference here from how I did it in class. And that's the use of the BufferedReader method .ready(), which returns false when the next line of the stream (called the buffer) is empty, and true when it's holding a line of text. This gives us a natural condition for our while loop. Here's the complete code of the servlet:

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.net.*;

public class WebCopier extends HttpServlet {
  public void doGet(HttpServletRequest request, HttpServletResponse response)
      throws IOException, ServletException {
  
    URL address = new URL("http://www.cs.indiana.edu/classes/a202");

    BufferedReader r = new BufferedReader(
      new InputStreamReader(address.openStream())
    );

    PrintWriter w = response.getWriter();

    String inputLine;

    while ( r.ready() ) {
      inputLine = r.readLine();
      w.println(inputLine);
    }

  r.close();

 }
}

Now this works exactly as advertised, but I should point out that it doesn't exactly conform to best practices when it comes to exception handling. In fact, the only reason why Java is even willing to compile this monster is because up in the declaration of the doGet() method, we said that the method throws IOException. Without that, the compiler would've complained about unchecked exceptions when we opened the stream, when we read lines from the stream, and even when we created the URL object. We've just passed along all those potential problems to our server software (Tomcat, in this case). Which means that if something goes wrong, the user will end up seeing a Tomcat error message, and for a simple test program like this, I guess that's okay. But we should get into the habit of catching our exceptions anyway, so let's fix this up a bit.

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.net.*;

public class WebCopier extends HttpServlet {
  public void doGet(HttpServletRequest request, HttpServletResponse response)
      throws IOException, ServletException {
    
    BufferedReader r = null;
    PrintWriter w = response.getWriter();
    
    try {
    
      URL address = new URL("http://www.cs.indiana.edu/classes/a202");
      r = new BufferedReader(
        new InputStreamReader(address.openStream())
      );
      String inputLine;
      while ( r.ready() ) {
        inputLine = r.readLine();
        w.println(inputLine);
      }
      
    } catch (IOException ex) {
    
      w.println("<h1>A Problem Was Encountered</h1>");
      w.println("An IOException occured while trying to access the site at the" 
                  + " given URL. Details:");
      w.println(ex);
    
    } finally {

      r.close();

    }
 }
}

There. Much better.

In your homework assignment, you'll be expected to catch your exceptions like this. In fact, since you'll be asking your user to input the URL, you need to do one better. If there's an I/O exception, your program will need to ask the user to re-enter the URL.