![]() |
![]() |
The Hypertext Transfer Protocol (HTTP) is the language web clients and servers use to communicate with each other. It is essentially the backbone of the World Wide Web. All HTTP transactions follow the same general format. Each client request and server response has three parts:
For example:
This uses theGET /index.html HTTP/1.1
GET
method to request the document index.html
using version 1.1
of HTTP.
This status line indicates that the server uses version 1.1 of HTTP in its response. A status code of 200 means that the client's request was successful, and the requested data will be supplied after the headers.HTTP/1.1 200 OK
In HTTP 1.0, after the server has finished sending the requested data, it disconnects from the client,
and the transaction is over, unless a Connection: Keep Alive
header is sent. Beginning with
HTTP 1.1, however, the default is for the server to maintain the connection and allow the client to make
additional requests. Since many documents embed other documents (inline images, frames, applets, etc.),
this saves the overhead of the client having to repeatedly connect to the same server just to draw a
single page.
Being a stateless protocol, HTTP does not maintain any information from one transaction to the next, so the next transaction needs to start all over again. The advantage is that an HTTP server can serve a lot more clients in a given period of time, since there is no additional overhead for tracking sessions from one connection to the next. The disadvantage is that more elaborate CGI programs need to use hidden input fields, or external tools such as cookies, to maintain information from one transaction to the next.
Methods: A method is an HTTP command that begins the first line of a client
request. The method tells the server the purpose of the client request. There are three methods defined
for HTTP: GET, HEAD, and POST. Other methods are also defined but not as widely supported by servers
(although the other methods will be used more often in the future, not less). Methods are case-sensitive,
so a "GET
" is different from a "get
".
The GET method The GET method is a request for information located at a specific URL on the server. It is the most commonly used method by browsers to retrieve information. The result of a GET request can be generated in many different ways: it can be a file accessible by the server, the output of a program or CGI script, the output from a hardware device, etc.
The entity-body portion of a GET request is always empty. GET is basically used to say "Give me this file".
The file or program the client requests is usually identified by its full pathname on the server. The GET method
is also used to send input to programs like CGI through form tags. Since GET requests have empty entity-bodies, the
input data is appended to the URL in the GET line of the request. When a <form>
tag specifies
the method-"GET"
attribute, key-value pairs representing the input from the form are appended to the
URL following a question mark (?). Pairs are separated by an ampersand (&). For example:
GET /cgi-bin/birthday.pl?month=august&date=24 HTTP/1.0
This causes the server to send the birthday.pl
CGI program the month
and date
values specified in a form on the client. The input data at the end of the URL is encoded to CGI specifications. For literal
use of special characters, the client uses hexadecimal notation.
The POST method The POST method allows data to be sent to the server in a client request. The data is directed to a data-handling program that the server has access to (e.g., a CGI script). The data sent to the server is in the entity-body section of the client's request. After the server processes the POST request and headers, it passes the entity-body to the program specified by the URL. The encoding scheme most commonly with POST is URL-encoding, which allows form data to be translated into a list of variables and values for CGI processing.
Other methods LINK, UNLINK, PUT, DELETE, OPTIONS, TRACE, CONECT.
Now do Experiment One below:
Place this script in your cgi-bin
:
Part A:#!/usr/bin/perl if ($ENV{REQUEST_METHOD} eq 'GET') { $in = $ENV{QUERY_STRING}; } else { read(STDIN, $in, $ENV{CONTENT_LENGTH}); } print "Content-type: text/html\n\n($in)\n";
Connect to your server using telnet
from tucotuco
this way:
Explain the process and the result.tucotuco.cs.indiana.edu% telnet burrowww 31090 Trying 129.79.245.98... Connected to burrowww.cs.indiana.edu. Escape character is '^]'. GET /cgi-bin/eOne HTTP/1.0 HTTP/1.1 200 OK Date: Sat, 14 Oct 2000 18:40:42 GMT Server: Apache/1.3.1 (Unix) Connection: close Content-Type: text/html () Connection closed by foreign host. tucotuco.cs.indiana.edu% telnet burrowww 31090 Trying 129.79.245.98... Connected to burrowww.cs.indiana.edu. Escape character is '^]'. GET /cgi-bin/eOne?hello HTTP/1.0 HTTP/1.1 200 OK Date: Sat, 14 Oct 2000 18:40:59 GMT Server: Apache/1.3.1 (Unix) Connection: close Content-Type: text/html (hello) Connection closed by foreign host. tucotuco.cs.indiana.edu%
Part B:
Now connect again but do it this way:
tucotuco.cs.indiana.edu% telnet burrowww 31090 Trying 129.79.245.98... Connected to burrowww.cs.indiana.edu. Escape character is '^]'. POST /cgi-bin/eOne HTTP/1.0 HTTP/1.1 200 OK Date: Sat, 14 Oct 2000 18:48:18 GMT Server: Apache/1.3.1 (Unix) Connection: close Content-Type: text/html () Connection closed by foreign host. tucotuco.cs.indiana.edu% telnet burrowww 31090 Trying 129.79.245.98... Connected to burrowww.cs.indiana.edu. Escape character is '^]'. POST /cgi-bin/eOne HTTP/1.0 Content-length: 5 heLLo HTTP/1.1 200 OK Date: Sat, 14 Oct 2000 18:48:40 GMT Server: Apache/1.3.1 (Unix) Connection: close Content-Type: text/html (heLLo) Connection closed by foreign host. tucotuco.cs.indiana.edu% telnet burrowww 31090 Trying 129.79.245.98... Connected to burrowww.cs.indiana.edu. Escape character is '^]'. POST /cgi-bin/eOne HTTP/1.0 Content-length: 5 abcdefghij HTTP/1.1 200 OK Date: Sat, 14 Oct 2000 18:49:34 GMT Server: Apache/1.3.1 (Unix) Connection: close Content-Type: text/html (abcde) Connection closed by foreign host. tucotuco.cs.indiana.edu%
Explain the process and the result, compare with Part A.
2. CGI
CGI allows the web server to communicate with other programs that are running on the server. For example, with CGI, the web server can invoke an external program, while passing user-specific data to the program (such as what host the user is connecting from, or input the user has supplied through an HTML form). The program then processes the data, and the server passes the program's response back to the web browser.
Parameters to a CGI program are transferred either in the URL or in the body text of the request. The method
used to pass parameters is determined by the method
attribute of the <form>
tag.
The GET method says to transfer the data within the URL itself. The POST method says to use the body portion of
the HTTP request to pass parameters.
The server passes the
pairs to the CGI program. It does this either through Unix environment variables or in standard input (STDIN). If the CGI program is called with the GET method, parameters are expected to be embedded in the URL of the request, and the server transfers them to the program by assigning them to the QUERY_STRING environment variable. The CGI program can then retrieve the parameters from QUERY_STRING as it would read any environment variable (for example, from the %ENV associative array in Perl). If the CGI program is called with the POST method, parameters are expected to be embedded into the body of the request, and the server passes the body text to the program as standard input.variable=value
URL Encoding Before data supplied on a form can be sent to a CGI program, each form element's
name (specified by the name
attribute) is equated with the value entered by the user to create a key-value
pair. Since under the GET method the form information is sent as part of the URL, form information can't include any
spaces or other special characters that are not allowed in URLs, and also can't include characters that have other
meanings in URLs, like slashes (/). (For the sake of consistency, this constraint also exists when the POST method is
being used). Therefore, the web browser performs some special encoding on user-supplied information.
Encoding involves replacing spaces and other special characters in the query strings with their hexadecimal equivalents.
(Thus, URL encoding is also sometimes called hexadecimal encoding). CGI scripts have to provide some way to "decode" form
data the client has encoded. The best way is to use CGI.pm and let it do the work for you, but in this class, for sake of
knowing what's going on we wrote ReadParse
.
Now do Experiment Two below.
Add eTwo
to your cgi-bin
:
Part A:#!/usr/bin/perl if ($ENV{REQUEST_METHOD} eq 'GET') { $in = $ENV{QUERY_STRING}; } else { read(STDIN, $in, $ENV{CONTENT_LENGTH}); } print "Content-type: text/html\n\n"; @in = split(/&/, $in); foreach $e (@in) { print $e, "\n"; }
Connect from tucotuco
this way:
frilled.cs.indiana.edu% telnet burrowww 31090 Trying 129.79.245.98... Connected to burrowww.cs.indiana.edu. Escape character is '^]'. GET /cgi-bin/eTwo?a=b&c=d a=b c=d Connection closed by foreign host. frilled.cs.indiana.edu% telnet burrowww 31090 Trying 129.79.245.98... Connected to burrowww.cs.indiana.edu. Escape character is '^]'. POST /cgi-bin/eTwo HTTP/1.0 Content-length: 7 a=b&c=d HTTP/1.1 200 OK Date: Sat, 14 Oct 2000 19:39:45 GMT Server: Apache/1.3.1 (Unix) Connection: close Content-Type: text/html a=b c=d Connection closed by foreign host. frilled.cs.indiana.edu%
Explain the process and the result.
Part B:
Create the following file, and add it as eTwo.html
in your htdocs
.
<html> <body bgcolor=white> <form method=POST action="/cgi-bin/eTwo"> <input type=text name=userInput> <p> <input type=submit> </form> </body> </html>
Then enter
in the text field and pressa=b&c=d
Submit
. Explain the process and the result, and the relationship with Part A (if any).
What changes (if anything) if we replace POST by GET in the method
attribute of the
<form>
tag.
3. Perl Substitutions
You've seen these last week in lecture notes. Here's a warm-up question.
What's this:
OK, back to normal.if ($arg =~ /^(([+-]{0,1})(\d*)(\.{0,1})(\d+))$/) { // a rose by any other name... } else { // not a number! }
We're using the =~
operator, together with the letter s
on its right
hand side, followed by a slash delimited pattern to be matched, and a string. When the pattern
matches the string that follows the second slash will replace it. There are several rules and
exceptions and we will summarize those that we care for here, through a couple of examples.
The dot (.) matches any one character except newline.
To have the substitution happen everywhere it can happen, usefrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/./a/; print $a; frilled.cs.indiana.edu%./alpha a234567890frilled.cs.indiana.edu%
g
(global) aftre the third slash.
The pattern can be bigger (or longer):frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/./a/g; print $a; frilled.cs.indiana.edu%./alpha aaaaaaaaaafrilled.cs.indiana.edu%
Parentheses can be used as memory elements:frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/../a/g; print $a; frilled.cs.indiana.edu%./alpha aaaaafrilled.cs.indiana.edu%
And they can include larger patterns:frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/(.)(.)/$2$1/g; print $a; frilled.cs.indiana.edu%./alpha 2143658709frilled.cs.indiana.edu%
To have the part between the last two slashes act as Perl code usefrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/(..)/$1+1/g; print $a; frilled.cs.indiana.edu%./alpha 12+134+156+178+190+1frilled.cs.indiana.edu%
e
(evaluate) after the third slash.
Miscellaneous A few other things needed infrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/(..)/$1+1/ge; print $a; frilled.cs.indiana.edu%./alpha 1335577991frilled.cs.indiana.edu%
ReadParse
are listed below.
Characters have (decimal) ASCII codes that can be obtain with ord
.
ASCII codes can be turned into characters withfrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl @values = ('A', 'B', 'C', 'D', 'E'); foreach $value (@values) { print $value, " has ASCII code: ", ord($value), "\n"; } frilled.cs.indiana.edu%./alpha A has ASCII code: 65 B has ASCII code: 66 C has ASCII code: 67 D has ASCII code: 68 E has ASCII code: 69 frilled.cs.indiana.edu%
chr
.
Thefrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl @values = (65, 66, 67, 68, 69); foreach $value (@values) { print "ASCII code $value stands for: ", chr($value), "\n"; } frilled.cs.indiana.edu%./alpha ASCII code 65 stands for: A ASCII code 66 stands for: B ASCII code 67 stands for: C ASCII code 68 stands for: D ASCII code 69 stands for: E frilled.cs.indiana.edu%
hex
function turns a hexadecimal value in a decimal one.
frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl @values = (1, 10, 20, 100, 110, 111); foreach $value (@values) { print "$value in base 16 is equal to ", hex($value), " in base 10.\n"; } frilled.cs.indiana.edu%./alpha 1 in base 16 is equal to 1 in base 10. 10 in base 16 is equal to 16 in base 10. 20 in base 16 is equal to 32 in base 10. 100 in base 16 is equal to 256 in base 10. 110 in base 16 is equal to 272 in base 10. 111 in base 16 is equal to 273 in base 10. frilled.cs.indiana.edu%
Now do Experiment 3 below.
Part A:
Add eThree.html
(below) to your htdocs
.
Bring it up in your browser, type<html> <body bgcolor=white> <form method="POST" action="/cgi-bin/eTwo"> <input type=text name="a"> <input type=text name="c"> <input type="submit" value="Proceed"> </form> </body> </html>
b
in the first field, and d
in the second. Then push the submit button. Explain the process and the result.
What relationship does this experiment have with any of the previous experiments?
Part B:
Now add eThree
to your cgi-bin
:
#!/usr/bin/perl if ($ENV{REQUEST_METHOD} eq 'GET') { $in = $ENV{QUERY_STRING}; } else { read(STDIN, $in, $ENV{CONTENT_LENGTH}); } print "Content-type: text/html\n\n"; @in = split(/&/, $in); foreach $e (@in) { ($name, $value) = split(/=/, $e); $name =~ s/%(..)/chr(hex($1))/ge; $value =~ s/%(..)/chr(hex($1))/ge; print $name, "=(", $value, ")"; }
Call it with
Try to anticipate the result (remember Part B from the previous experiment) and explain it.http://burrowww.cs.indiana.edu:31xxx/cgi-bin/eThree?userInput=a%3Db%26c%3Dd
Part C:
Use the form from Part A, and point it to process
from last
time:
Then type#!/usr/bin/perl print qq{Content-type: text/html\n\n<html><body>}; $input = $ENV{QUERY_STRING}; print "($input)"; print qq{</body></html>};
%
in the first field and =
in the second.
(Later repeat everything using space, or
~
, or /
, or even &
).
Explain the result.
Now call process
directly:
What's the difference?http://burrowww.cs.indiana.edu:31xxx/cgi-bin/process?a=%&c==
Why does it matter?
You should understand CGI very well now.