![]() |
CSCI A348/A548Lecture Notes 5 Fall 1999 |
To understand how scripts interact with web browsers and servers we begin
by reviewing a simpler interaction: how static HTML files are requested by
and displayed by users. Let's say that I have the following simple, basic
HTML file in my DocumentRoot
and its name is
hello.html
.
The path to the file is<html> <head> <title> Hello world! </title> </head> <body> <h1> Hello world! </h1> <p> How are you doing? </p> </body> </html>
and it has to be made readable by the world for it to be accesible over the web:/u/dgerman/httpd/htdocs/hello.html
Once we've created the HTML text, it may seem that the process of delivering it to a web browser should be a trivial task. But serving even a simple page like this one requires that a lot of coordination occur between the browser and the web server on which the page is stored.tucotuco.cs.indiana.edu% pwd /nfs/paca/home/user2/dgerman/httpd/htdocs tucotuco.cs.indiana.edu% vi hello.html tucotuco.cs.indiana.edu% cat hello.html <head> <title> Hello world! </title> </head> <body> <h1> Hello world! </h1> <p> How are you doing? </p> </body> </html> tucotuco.cs.indiana.edu% ls -l hell* -rw-r--r-- 1 dgerman students 126 Sep 5 18:39 hello.html tucotuco.cs.indiana.edu%
By web server we mean a program residing on a host machine that uses the Hypertext Transport Protocol (HTTP) to communicate with the browser. The program that stored in my burrow account in
/u/dgerman/httpd/httpd
is such a program, and you are using similar servers. The web is
based on a client-server model. This means that there is a
server (that provides resources) and a client (which requests them). Here's what we need to keep in mind about them:
There are thousands of web servers throughout the world (wide web) but they are all acessible from any browser because they have all agreed to use a common protocol - the Hypertext Transfer Protocol (HTTP). HTTP is based on an exchange of requests and responses.
Each request can be thought of as a command, or action, which is sent by the browser to the server to be carried out. The server performs the requested service and returns its answer in the form of a response.
The components of a simple WWW interaction are the user, the client, and the server. The client acts as an intermediary between the user and the server.
Steps 1-7 detail the basic information flow in a simple HTTP transaction. Essentially the client requests a file and the server delivers it. The entire HTTP process takes place as a result of simple transactions of requests and responses.
http://tucotuco.cs.indiana.edu:19904/hello.html
and clicks the hyperlink or types the URL into
the browser.
Open Location
in Netscape) says that the computer
tucotuco.cs.indiana.edu
needs to be contacted on port 19904
and that the
/hello.html
file is needed. For this the browser
sends the HTTP GET
command to the server (not shown
here - we'll look at how this works when we simulate this request
process using telnet
). The path to the requested file
is relative to the server's document root).
GET
request to the server,
indicating what file
it needs. This request travels over the Internet, going from computer to
computer until it reaches the web server's host: tucotuco
in
the CSCI's burrow cluster. There's a network security aspect here that we will need to address later.
.html
) to determine the type of information
in the file. The .html
means that it will send back to
the browser the file but it will first say: the file's
Content-type: text/html
. You do not have to write this in the file, it is inferred by the server from the file's extension. But the server does send this information to the browser as part of the header, followed by the data (the actual file) as explained below.
Content-type: text/html
The headers are then followed by (a blank line and then by)
the HTML data itself.
Content-type:
part of the header tells the
browser that the data is text formatted in HTML, so the browser renders
the text appropriately, highlighting hyperlinks, etc.
When the server receives a request to access the database it passes the request to a gateway program which does whatever is necessary to get the data and return the results to the server.
The server then repackages the information from the script, and forwards the information back to the client. (In a sense the server acts as a sort of translator, taking data from either a file or script and providing it to the browsers in a consistent and uniform manner).
We make two observations now:
The CGI protocol
So the process of servicing the
http://tucotuco.cs.indiana.edu:19904/cgi-bin/hello
request is different, because by the shape of the request the server
realizes that it needs to execute the script specified by that address
(or path) rather than simply retrieving the file. Upon starting the script, the server provides it with a variety of potentially useful information (such as the name of the machine from which the request originated, type of browser used, etc.) and then starts the script. What follows is of no concern to the server, other than the output of the script, which the server will send back to the requesting browser.
You take a lot of responsibility this way if you're writing the script.
While the server doesn't care how the script generates its output, it does
need to know the format of the output - the script's output is, after all,
the server's input (on the path back to the user). Recall that when
the web server delivers a static file to the browser, it uses a filename
extension to determine what to return in the Content-type
header.
This technique doesn't work for scripts, because a script's
filename is unrelated to the type of information it returns.
A script named getpic
, for example, may return
an image as its data (Content-type: image/gif
)
while the similarly named getinfo
might return HTML
text (Content-type: text/html
). It is even
possible for a single script to output different sorts of data
depending upon the context in which it is called. Therefore it
is absolutely essential that the script notify the server of the
type of data it is generating, so that the server can pass this
information on to the client.
The hello
script is presented below:
The program has one statement only, which prints a few lines.#!/usr/bin/perl print qq{Content-type: text/html\n\n<html> <head> <title> Hello world! </title> </head> <body> <h1> Hello world! </h1> <p> How are you doing? </p> </body> </html> };
The first line starts by specifying the media type that identifies the data
in the body. The \n\n
that follow the header information are
translated by Perl into newlines. The first ends the line with the
content type, while the second inserts a (mandatory) blank line that separates
the header from the rest of the message.
The remainder of the script simply
outputs HTML text that looks suspiciously similar to the contents of the
static hello.html
shown earlier, beginning with the familiar
<html>
tag and ending with </html>
.
All of this text output by the print
statement is sent to
the server which executed the script.
The server captures the output,
constructs a set of HTTP message headers (including the
Content type
returned from the script),
and sends these headers and the rest of the script's output
to the browser. Upon receiving and interpreting the data, the
browser is left with the HTML shown in
blue
in the code fragment
above.
This is rendered by the receiving browser in the exact same way as
the hello.html
file that we started with. So if we were
to look only at the output we couldn't make any difference
between the two approaches.
CGI scripts are quite flexible precisely because the server itself is not really involved in the process. The server's primary responsibilities are to
To summarize, the information flow is as follows:
http://tucotuco.cs.indiana.edu:19904/cgi-bin/hello
GET
message requesting
/cgi-bin/hello
is sent to
tucotuco.cs.indiana.edu
on
port 19904
/cgi-bin/
) it determines it should
run the script instead of simply retriving the file.
Content-type
header to indicate the format of the data to the server, for example:
Content-type: text/html
The
headers are then followed by the HTML text generated by the script. From
here on it's the same story as before:
Content-type
) and data go
directly from program to the server.
Content-type
header from the script. Following
the headers is the actual HTML script.
Content-type
header
tells the browser that the data is HTML, so the
browser formats and renders the text appropriately,
including highlighting links.
Knowing these we will finish the Perl primer of last time and explain the following script that comes with the server:
Its name is#!/usr/local/bin/perl print "Content-type: text/html\n\n"; while (($key, $val) = each %ENV) { print "$key = $val<BR>\n"; }
printenv
and is our entry point to CGI.