Spring Semester 2005


Lab Notes Four: HTTP Experiments
In all that follows please use GenericClient (of Lecture Notes Eight) instead of telnet.

1. HTTP (The Hypertext Transfer Protocol)

The Hypertext Transfer Protocol (HTTP) is the language web clients and servers use to communicate with each other. It is essentially the backbone of the World Wide Web. All HTTP transactions follow the same general format. Each client request and server response has three parts:

  1. the client request or server response line,
  2. a header section, and
  3. the entity body.
The client initiates a transaction as follows:

  1. The client contacts the server at a designated port number (by default, 80). It then sends a document request by specifying an HTTP command called a method, followed by a document address, and an HTTP version number.

    For example:

    GET /index.html HTTP/1.0
    This uses the GET method to request the document index.html using version 1.0 of HTTP.

  2. Next the client sends optional header information to inform the server of its configuration, and the document formats that it will accept. All header information is given line by line, each with a header name and value. The client sends a blank line to end the header.

  3. After sending the request and headers, the client may send additional data. (This data is mostly used by CGI programs that use the POST method).

The server responds in the folllowing way to the client's request:

  1. The server replies with a status line containing three fields: HTTP version, status code, and description. The HTTP version indicates the version of HTTP that the server is using to respond. the status code is a three-digit number that indicates the server's result of the client's request. The description following the status code is simply human-readable text that describes the status code. For example:
    HTTP/1.0 200 OK
    This status line indicates that the server uses version 1.0 of HTTP in its response. A status code of 200 means that the client's request was successful, and the requested data will be supplied after the headers.

  2. After the status line, the server sends header information to the client about itself and the requested document. A blank line ends the header.

  3. If the client's request is successful, the requested data is sent. This data may be a copy of a file, or the response from a CGI program. If the client's request could not be fulfilled, the additional data may be a human-readable explanation of why the server could not fulfill the request.

In HTTP 1.0, after the server has finished sending the requested data, it disconnects from the client, and the transaction is over, unless a Connection: Keep Alive header is sent. Beginning with HTTP 1.1, however, the default is for the server to maintain the connection and allow the client to make additional requests. Since many documents embed other documents (inline images, frames, applets, etc.), this saves the overhead of the client having to repeatedly connect to the same server just to draw a single page.

Being a stateless protocol, HTTP does not maintain any information from one transaction to the next, so the next transaction needs to start all over again. The advantage is that an HTTP server can serve a lot more clients in a given period of time, since there is no additional overhead for tracking sessions from one connection to the next. The disadvantage is that more elaborate CGI programs need to use hidden input fields, or external tools such as cookies, to maintain information from one transaction to the next.

Methods: A method is an HTTP command that begins the first line of a client request. The method tells the server the purpose of the client request. There are three methods defined for HTTP: GET, HEAD, and POST. Other methods are also defined but not as widely supported by servers (although the other methods will be used more often in the future, not less). Methods are case-sensitive, so a "GET" is different from a "get".

The GET method The GET method is a request for information located at a specific URL on the server. It is the most commonly used method by browsers to retrieve information. The result of a GET request can be generated in many different ways: it can be a file accessible by the server, the output of a program or CGI script, the output from a hardware device, etc.

The entity-body portion of a GET request is always empty. GET is basically used to say "Give me this file". The file or program the client requests is usually identified by its full pathname on the server. The GET method is also used to send input to programs like CGI through form tags. Since GET requests have empty entity-bodies, the input data is appended to the URL in the GET line of the request. When a <form> tag specifies the method-"GET" attribute, key-value pairs representing the input from the form are appended to the URL following a question mark (?). Pairs are separated by an ampersand (&). For example:

GET /cgi-bin/birthday.pl?month=august&date=24 HTTP/1.0

This causes the server to send the birthday.pl CGI program the month and date values specified in a form on the client. The input data at the end of the URL is encoded to CGI specifications. For literal use of special characters, the client uses hexadecimal notation.

The POST method The POST method allows data to be sent to the server in a client request. The data is directed to a data-handling program that the server has access to (e.g., a CGI script). The data sent to the server is in the entity-body section of the client's request. After the server processes the POST request and headers, it passes the entity-body to the program specified by the URL. The encoding scheme most commonly with POST is URL-encoding, which allows form data to be translated into a list of variables and values for CGI processing.

Other methods LINK, UNLINK, PUT, DELETE, OPTIONS, TRACE, CONECT.

Now do Experiment One below:

Place this script in cgi-bin/eOne:

#!/usr/bin/perl

if ($ENV{REQUEST_METHOD} eq 'GET') {
  $in = $ENV{QUERY_STRING}; 
} else { 
  read(STDIN, $in, $ENV{CONTENT_LENGTH}); 
} 

print "Content-type: text/html\n\n($in)\n"; 
Part A:

Connect to your server using telnet from tucotuco this way:

tucotuco.cs.indiana.edu% telnet burrowww 10200
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
GET /cgi-bin/eOne HTTP/1.0

HTTP/1.0 200 OK
Date: Sat, 14 Oct 2000 18:40:42 GMT
Server: Apache/1.3.1 (Unix)
Connection: close
Content-Type: text/html

()
Connection closed by foreign host.
tucotuco.cs.indiana.edu% telnet burrowww 10200
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
GET /cgi-bin/eOne?hello HTTP/1.0

HTTP/1.0 200 OK
Date: Sat, 14 Oct 2000 18:40:59 GMT
Server: Apache/1.3.1 (Unix)
Connection: close
Content-Type: text/html

(hello)
Connection closed by foreign host.
tucotuco.cs.indiana.edu% 
Explain the process and the result.

Part B:

Now connect again but do it this way:

tucotuco.cs.indiana.edu% telnet burrowww 10200
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
POST /cgi-bin/eOne HTTP/1.0

HTTP/1.0 200 OK
Date: Sat, 14 Oct 2000 18:48:18 GMT
Server: Apache/1.3.1 (Unix)
Connection: close
Content-Type: text/html

() 
Connection closed by foreign host.
tucotuco.cs.indiana.edu% telnet burrowww 10200
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
POST /cgi-bin/eOne HTTP/1.0
Content-length: 5

heLLo
HTTP/1.0 200 OK
Date: Sat, 14 Oct 2000 18:48:40 GMT
Server: Apache/1.3.1 (Unix)
Connection: close
Content-Type: text/html

(heLLo)
Connection closed by foreign host.
tucotuco.cs.indiana.edu% telnet burrowww 10200
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
POST /cgi-bin/eOne HTTP/1.0
Content-length: 5

abcdefghij
HTTP/1.0 200 OK
Date: Sat, 14 Oct 2000 18:49:34 GMT
Server: Apache/1.3.1 (Unix)
Connection: close
Content-Type: text/html

(abcde)
Connection closed by foreign host.
tucotuco.cs.indiana.edu% 

Explain the process and the result, compare with Part A.


2. CGI

CGI allows the web server to communicate with other programs that are running on the server. For example, with CGI, the web server can invoke an external program, while passing user-specific data to the program (such as what host the user is connecting from, or input the user has supplied through an HTML form). The program then processes the data, and the server passes the program's response back to the web browser.

Parameters to a CGI program are transferred either in the URL or in the body text of the request. The method used to pass parameters is determined by the method attribute of the <form> tag. The GET method says to transfer the data within the URL itself. The POST method says to use the body portion of the HTTP request to pass parameters.

The server passes the

variable=value
pairs to the CGI program. It does this either through Unix environment variables or in standard input (STDIN). If the CGI program is called with the GET method, parameters are expected to be embedded in the URL of the request, and the server transfers them to the program by assigning them to the QUERY_STRING environment variable. The CGI program can then retrieve the parameters from QUERY_STRING as it would read any environment variable (for example, from the %ENV associative array in Perl). If the CGI program is called with the POST method, parameters are expected to be embedded into the body of the request, and the server passes the body text to the program as standard input.

URL Encoding Before data supplied on a form can be sent to a CGI program, each form element's name (specified by the name attribute) is equated with the value entered by the user to create a key-value pair. Since under the GET method the form information is sent as part of the URL, form information can't include any spaces or other special characters that are not allowed in URLs, and also can't include characters that have other meanings in URLs, like slashes (/). (For the sake of consistency, this constraint also exists when the POST method is being used).

Therefore, the web browser performs some special encoding on user-supplied information.

The encoding involves replacing spaces and other special characters in the query strings with their hexadecimal equivalents. (Thus, URL encoding is also sometimes called hexadecimal encoding). CGI scripts have to provide some way to "decode" form data the client has encoded. The best way is to use CGI.pm and let it do the work for you, but in this class, for sake of knowing what's going on we wrote ReadParse.

Now do Experiment Two below.

Add eTwo to your cgi-bin:

#!/usr/bin/perl

if ($ENV{REQUEST_METHOD} eq 'GET') {
  $in = $ENV{QUERY_STRING}; 
} else { 
  read(STDIN, $in, $ENV{CONTENT_LENGTH}); 
} 

print "Content-type: text/html\n\n"; 

@in = split(/&/, $in); 

foreach $e (@in) {
    print $e, "\n"; 
} 
Part A:

Connect from tucotuco this way:

frilled.cs.indiana.edu% telnet burrowww 10200
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
GET /cgi-bin/eTwo?a=b&c=d
a=b
c=d
Connection closed by foreign host.
frilled.cs.indiana.edu% telnet burrowww 10200
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
POST /cgi-bin/eTwo HTTP/1.0
Content-length: 7

a=b&c=d
HTTP/1.0 200 OK
Date: Sat, 14 Oct 2000 19:39:45 GMT
Server: Apache/1.3.1 (Unix)
Connection: close
Content-Type: text/html

a=b
c=d
Connection closed by foreign host.
frilled.cs.indiana.edu% 

Explain the process and the result.

Part B:

Create the following file, and add it as eTwo.html in your htdocs.

<html>
<body bgcolor=white>
<form method=POST action="/cgi-bin/eTwo">
<input type=text name=userInput> <p>
<input type=submit>
</form>
</body>
</html>

Then enter

a=b&c=d
in the text field and press Submit.

Explain the process and the result, and the relationship with Part A (if any).

Also

  1. What changes (if anything) if we replace POST by GET in the method attribute of <form>.

  2. Rework both experiments and try to send a value of & (and) for a, and a value of = (equals) for b) to the web server. What difference do you notice between the two experimental setups?


3. Perl Substitutions

You've seen these last week in lecture notes. Here's a warm-up question.

What's this:

if ($arg =~ /^(([+-]{0,1})(\d*)(\.{0,1})(\d+))$/) {
  // a rose by any other name... 
} else {
  // not a number! 
} 
OK, back to normal.

We're using the =~ operator, together with the letter s on its right hand side, followed by a slash delimited pattern to be matched, and a string. When the pattern matches the string that follows the second slash will replace it. There are several rules and exceptions and we will summarize those that we care for here, through a couple of examples.

The dot (.) matches any one character except newline.

frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
$a = "1234567890"; 
$a =~ s/./a/; 
print $a; 
frilled.cs.indiana.edu%./alpha
a234567890frilled.cs.indiana.edu%
To have the substitution happen everywhere it can happen, use g (global) after the third slash.

frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
$a = "1234567890"; 
$a =~ s/./a/g; 
print $a; 
frilled.cs.indiana.edu%./alpha
aaaaaaaaaafrilled.cs.indiana.edu%
The pattern can be bigger (or longer):
frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
$a = "1234567890"; 
$a =~ s/../a/g; 
print $a; 
frilled.cs.indiana.edu%./alpha
aaaaafrilled.cs.indiana.edu%
Parentheses can be used as memory elements:

frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
$a = "1234567890"; 
$a =~ s/(.)(.)/$2$1/g; 
print $a; 
frilled.cs.indiana.edu%./alpha
2143658709frilled.cs.indiana.edu%
And they can include larger patterns:
frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
$a = "1234567890"; 
$a =~ s/(..)/$1+1/g; 
print $a; 
frilled.cs.indiana.edu%./alpha
12+134+156+178+190+1frilled.cs.indiana.edu%
To have the part between the last two slashes act as Perl code use e (evaluate) after the third slash.

frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
$a = "1234567890"; 
$a =~ s/(..)/$1+1/ge; 
print $a; 
frilled.cs.indiana.edu%./alpha
1335577991frilled.cs.indiana.edu%
Miscellaneous A few other things needed in ReadParse are listed below.

Characters have (decimal) ASCII codes that can be obtain with ord.

frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
@values = ('A', 'B', 'C', 'D', 'E'); 
foreach $value (@values) {
  print $value, " has ASCII code: ", ord($value), "\n"; 
} 
frilled.cs.indiana.edu%./alpha
A has ASCII code: 65
B has ASCII code: 66
C has ASCII code: 67
D has ASCII code: 68
E has ASCII code: 69
frilled.cs.indiana.edu%
ASCII codes can be turned into characters with chr.

frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
@values = (65, 66, 67, 68, 69); 
foreach $value (@values) {
  print "ASCII code $value stands for: ", chr($value), "\n"; 
} 
frilled.cs.indiana.edu%./alpha
ASCII code 65 stands for: A
ASCII code 66 stands for: B
ASCII code 67 stands for: C
ASCII code 68 stands for: D
ASCII code 69 stands for: E
frilled.cs.indiana.edu%
The hex function turns a hexadecimal value in a decimal one.

frilled.cs.indiana.edu%cat alpha
#!/usr/bin/perl
@values = (1, 10, 20, 100, 110, 111); 
foreach $value (@values) {
  print "$value in base 16 is equal to ", hex($value), " in base 10.\n"; 
} 
frilled.cs.indiana.edu%./alpha
1 in base 16 is equal to 1 in base 10.
10 in base 16 is equal to 16 in base 10.
20 in base 16 is equal to 32 in base 10.
100 in base 16 is equal to 256 in base 10.
110 in base 16 is equal to 272 in base 10.
111 in base 16 is equal to 273 in base 10.
frilled.cs.indiana.edu%

Now do Experiment 3 below.

Part A:

Add eThree.html (below) to your htdocs.

<html>
  <body bgcolor=white>
    <form method="POST" action="/cgi-bin/eTwo">
      <input type=text name="a"> 
      <input type=text name="c"> 
      <input type="submit" value="Proceed">
    </form>
  </body>
</html>
Bring it up in your browser, type b in the first field, and d in the second.

Then push the submit button. Explain the process and the result.

What relationship does this experiment have with any of the previous experiments?

Part B:

Now add eThree to your cgi-bin:

#!/usr/bin/perl

if ($ENV{REQUEST_METHOD} eq 'GET') {
  $in = $ENV{QUERY_STRING};
} else {
  read(STDIN, $in, $ENV{CONTENT_LENGTH});
}

print "Content-type: text/html\n\n";

@in = split(/&/, $in);

foreach $e (@in) {
  ($name, $value) = split(/=/, $e);
  $name  =~ s/%(..)/chr(hex($1))/ge;
  $value =~ s/%(..)/chr(hex($1))/ge;
  print $name, "=(", $value, ")";
}

Call it with

http://burrowww.cs.indiana.edu:176xx/cgi-bin/eThree?userInput=a%3Db%26c%3Dd
Try to anticipate the result (remember Part B from the previous experiment) and explain it.

Part C:

Use the form from Part A, and point it to process from last time:

#!/usr/bin/perl

print qq{Content-type: text/html\n\n<html><body>};

$input = $ENV{QUERY_STRING}; 

print "($input)"; 

print qq{</body></html>}; 
Then type % in the first field and = in the second.

(Later repeat everything using space, or ~, or /, or even &).

Explain the result.

Now call process directly:

http://burrowww.cs.indiana.edu:102xx/cgi-bin/process?a=%&c==
What's the difference?

Why does it matter?

You should understand CGI very well now.


4. A Simple Web Browser in Java

In class we paired this browser with Apache.

//  SimpleBrowser.java

import java.awt.*;
import java.awt.event.*;
import java.io.*;

import javax.swing.*;
import javax.swing.text.html.*;
import javax.swing.event.*;

public class SimpleBrowser extends JFrame {

    static JTextField textField;
    static JEditorPane editor;

    public SimpleBrowser(String s) {
        super(s);

        JPanel panel = new JPanel();
        panel.setLayout(new BorderLayout());
        panel.setBorder(BorderFactory.createRaisedBevelBorder());

        editor = new JEditorPane();
        textField  = new JTextField();
        JScrollPane scrollPane = new JScrollPane(editor);

        editor.setEditable(false);

        panel.add(new JLabel("Location:  "), BorderLayout.WEST);
        panel.add(textField, BorderLayout.CENTER);

        getContentPane().add(panel, BorderLayout.NORTH);
        getContentPane().add(scrollPane, BorderLayout.CENTER);

        textField.addActionListener(new TextFieldListener());
    }

    public static void main(String args[]) {
        SimpleBrowser frame = new SimpleBrowser("Simple Browser");
        frame.setSize(400,400);
        frame.setVisible(true);
    }

    class TextFieldListener implements ActionListener {

        public void actionPerformed(ActionEvent e) {
            try {
                editor.setPage(textField.getText());
            } catch (IOException ex) {
                editor.setText("Page could not be loaded");
            }
        }
    }
}
We also tried Netscape with the server that we wrote yesterday.

Finally here's your:

A348/A548 LAB ASSIGNMENT FOUR

UNDERGRADUATES
Perform the experiments, answer the questions in a document, in protected.

GRADUATES
Go through the Java browser and server experiments and write a report.


Last updated on Jan 25, 2005 by Adrian German for A348/A548