![]() |
![]() Spring Semester 2005 |
Let's now look at pattern matching.
1. Basic Pattern Matching in Perl
We're using the =~
operator, together with the letter s
on its right
hand side, followed by a slash delimited pattern to be matched, and a string. When the pattern
matches, the string that follows the second slash will replace it. There are several rules and
exceptions and we will summarize those that we care for here, through a couple of examples.
The dot (.) matches any individual character except newline.
To have the substitution happen everywhere possible, usefrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/./a/; print $a; frilled.cs.indiana.edu%./alpha a234567890frilled.cs.indiana.edu%
g
(global) after the third slash.
The pattern can be bigger (or longer):frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/./a/g; print $a; frilled.cs.indiana.edu%./alpha aaaaaaaaaafrilled.cs.indiana.edu%
Parentheses can be used as memory elements:frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/../a/g; print $a; frilled.cs.indiana.edu%./alpha aaaaafrilled.cs.indiana.edu%
And they can include larger patterns:frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/(.)(.)/$2$1/g; print $a; frilled.cs.indiana.edu%./alpha 2143658709frilled.cs.indiana.edu%
To have the part between the last two slashes act as Perl code usefrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/(..)/$1+1/g; print $a; frilled.cs.indiana.edu%./alpha 12+134+156+178+190+1frilled.cs.indiana.edu%
e
(evaluate) after the third slash.
A few other things needed infrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl $a = "1234567890"; $a =~ s/(..)/$1+1/ge; print $a; frilled.cs.indiana.edu%./alpha 1335577991frilled.cs.indiana.edu%
ReadParse
are listed below. 2. Additional Information
Characters have (decimal) ASCII codes that can be obtain with ord
.
ASCII codes can be turned into characters withfrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl @values = ('A', 'B', 'C', 'D', 'E'); foreach $value (@values) { print $value, " has ASCII code: ", ord($value), "\n"; } frilled.cs.indiana.edu%./alpha A has ASCII code: 65 B has ASCII code: 66 C has ASCII code: 67 D has ASCII code: 68 E has ASCII code: 69 frilled.cs.indiana.edu%
chr
.
Thefrilled.cs.indiana.edu%cat alpha #!/usr/bin/perl @values = (65, 66, 67, 68, 69); foreach $value (@values) { print "ASCII code $value stands for: ", chr($value), "\n"; } frilled.cs.indiana.edu%./alpha ASCII code 65 stands for: A ASCII code 66 stands for: B ASCII code 67 stands for: C ASCII code 68 stands for: D ASCII code 69 stands for: E frilled.cs.indiana.edu%
hex
function turns a hexadecimal value in a decimal one.
frilled.cs.indiana.edu%cat alpha #!/usr/bin/perl @values = (1, 10, 20, 100, 110, 111); foreach $value (@values) { print "$value in base 16 is equal to ", hex($value), " in base 10.\n"; } frilled.cs.indiana.edu%./alpha 1 in base 16 is equal to 1 in base 10. 10 in base 16 is equal to 16 in base 10. 20 in base 16 is equal to 32 in base 10. 100 in base 16 is equal to 256 in base 10. 110 in base 16 is equal to 272 in base 10. 111 in base 16 is equal to 273 in base 10. frilled.cs.indiana.edu%
3. Basic HTML Forms
Next we can discuss the various HTML form elements, for example:
To display: Use: Attributes: A form <form>
... HTML form info
</form>
method
action
enctype
Single-line text field <input type=text> name value maxlength sizeSingle-line password field <input type=password> name value maxlength sizeMultiple-line text area <textarea></textarea> name cols rows wrapCheckbox <input type=checkbox> name value checkedRadio buttons <input type=radio> name value checkedList of choices <select>
items in list...
</select>
name multiple sizeItems in a <select> list
<option>
value selectedClickable image <input type=image> name align srcFile upload <input type=file> name acceptHidden field <input type=hidden> name valueReset button <input type=reset> valueSubmit button <input type=submit> name value
We now want to build a generic CGI processor.
4. Building a Generic CGI Processor
We also need to come up with a definition of CGI.
For this purpose let's again review what we have done so far in terms of CGI.
hello.html
in Lab Two, placed in htdocs
.
hello
) which
we placed in cgi-bin
and whose output was the same as when we accessed the
hello.html
file on the web. hello.html
was in htdocs
.
hello
was in your script (cgi-bin
) directory.
was the first thing that the script was supposed to write. Note the two newline characters, an empty line is required after the MIME type. We took the script and changed the output a little, to make it display an image."Content-type: text/html\n\n"
To implement the change in output we created a list of names of images. Then every time the script is called, a random number that represents an index in the list of names of images will be produced and the image with that index will appear in the output.
We said the answer was "yes" and to explain that we introduced a short script by the name of
printenv
. Each one of our servers had this script in their cgi-bin
directories after installation. It looked like this:
#!/usr/bin/perl print "Content-type: text/html\n\n<html><body><pre>"; foreach $elem (keys %ENV) { print $elem, " --> ", $ENV{$elem}, "\n"; } print "</pre></body></html>";
%ENV
is built by the system. Browser, server, host operating system
contribute to it. The info is passed to the script. One of the keys in this hash table is
called QUERY_STRING
. If we put a ?
(question mark) after the
name of the program (when we invoke its URL) the string that follows, up to the first
blank space, will be placed in We also noted that there was an entry in$ENV{"QUERY_STRING"}
%ENV
for REQUEST_METHOD
. The
value associated with $ENV{REQUEST_METHOD}
was GET
(please confirm that
through your own experiments). OK, that was the review.
<form method="GET" action="/cgi-bin/printenv"> <input type=text name=fieldOne> <p> <input type=text name=fieldOne> <p> <input type=text name=fieldOne> <p> <input type=submit> <p> </form>
Using this form we should be able to call our script, and even pass spaces to it.
/
).
CGI is, in fact, the transfer of
information
And the transfer can be done in two ways, that are identified by the keywords
GET
and
POST
.
GET
or POST
)
the transfer always involves the encoding of special characters in a particular
way. It is the purpose of this lecture to clarify the encoding scheme as well as
how one can access that information (that is passed to the script) inside the script.
%
followed by the
two hexadecimal characters that make up the ASCII code of the character.
An example: A
has ASCII code 6510
.
In base 16 this is: 4116
.
0-9
, and a-f
.
There are 256 character codes, so two hexadecimal digits would be enough to represent them all (from
0
all the way up to ff16
which is 25510
).
GET
as the transmission mode, then all the data will
be put together in one long string, encoded as described above, and placed such that the script will find
it in $ENV{"QUERY_STRING"}
.
Now, this second line will have to be clarified, but this is not as hard as it may appear.$input = $ENV{"QUERY_STRING"}; $input =~ s/%(..)/chr(hex($1))/ge;
POST
then the info no longer comes through the QUERY_STRING
and
instead the script is receiving it through a channel that it identifies as its standard input (STDIN). So the
read process will be somewhat different:
read(STDIN, $input, $ENV{"CONTENT_LENGTH"});
read
from the standard input, into a buffer called $input
and we need to specify
how many characters we want to read. Fortunately this number is available to us in the %ENV
hash table,
associated with the CONTENT_LENGTH
key.
GET
it's in $ENV{'QUERY_STRING'}
POST
it's coming through STDIN
#!/usr/bin/perl &printHeader; if ($ENV{REQUEST_METHOD} eq 'GET' ) { print "Called with GET." ; } elsif ($ENV{REQUEST_METHOD} eq 'POST') { print "Called with POST."; } else { print "Method not supported.\n"; } &printTrailer; sub printHeader { print "Content-type: text/html\n\n<html><body>"; } sub printTrailer { print "</body></html>"; }
#!/usr/bin/perl &printHeader; if ($ENV{"REQUEST_METHOD"} eq 'GET' ) { $me = $ENV{"SCRIPT_NAME"}; print qq{ <form method=POST action=$me> Please write your thoughts below: <p> <textarea name="thoughts" rows=5 cols=60></textarea> <p> Also please write your e-mail address here: <input type="text" name="email"> <p> <input type="submit"> </form> }; } elsif ($ENV{REQUEST_METHOD} eq 'POST') { print "Called with POST."; } else { print "Method not supported.\n"; } &printTrailer; sub printHeader { print "Content-type: text/html\n\n<html><body>"; } sub printTrailer { print "</body></html>"; }
In class we need to explain this very thoroughly.#!/usr/bin/perl &printHeader; &readParse; if ($ENV{"REQUEST_METHOD"} eq 'GET' ) { $me = $ENV{"SCRIPT_NAME"}; print qq{ <form method=POST action=$me> Please write your thoughts below: <p> <textarea name="thoughts" rows=5 cols=60></textarea> <p> Also please write your e-mail address here: <input type="text" name="email"> <p> <input type="submit"> </form> }; } elsif ($ENV{"REQUEST_METHOD"} eq 'POST') { print "Called with POST.<pre>"; foreach $k (keys %in) { print $k, " --> ", $in{$k}, "<br>"; } } else { print "Method not supported.\n"; } &printTrailer; sub printHeader { print "Content-type: text/html\n\n<html><body>"; } sub printTrailer { print "</body></html>"; } sub readParse { if ($ENV{"REQUEST_METHOD"} eq 'GET' ) { $input = $ENV{"QUERY_STRING"}; } elsif ($ENV{"REQUEST_METHOD"} eq 'POST') { read (STDIN, $input, $ENV{"CONTENT_LENGTH"}); } else { print "Unsupported method."; &printTrailer; exit; } @input = split(/\&/, $input); foreach $elem (@input) { $elem =~ s/%(..)/chr(hex($1))/ge; $elem =~ s/\+/ /g; ($key, $value) = split(/\=/, $elem); $in{$key} = $value; } }