CSCI A348/548
Lecture Notes Seven

Spring 2001 (Second semester 2000-2001)


This is the one line summary of lecture notes 7.

Sonal came here for office hours Friday morning and we worked this out:

#!/usr/bin/perl

srand;

while ($x = <STDIN>) {
  chop ($x);
  # print "(", $x, ")\n"; 
  if ($x eq "bye") { print "Thank you!\n"; last; } 

  @a = (@a, $x); 

} 

for ($i = 0; $i <= $#a; $i++) {

  print $i, ". ", $a[$i], "\n"; 

} 

print "Random selection: ", $a[rand(@a)], "\n"; 
print "Random selection: ", $a[rand(@a)], "\n"; 
print "Random selection: ", $a[rand(@a)], "\n"; 
print "Random selection: ", $a[rand(@a)], "\n"; 
print "Random selection: ", $a[rand(@a)], "\n"; 
print "Random selection: ", $a[rand(@a)], "\n"; 
print "Random selection: ", $a[rand(@a)], "\n"; 

Could you tell what it does?

Here's a sample output:

frilled.cs.indiana.edu%vi one
frilled.cs.indiana.edu%chmod 755 one
frilled.cs.indiana.edu%./one
one
two
three
four
five
bye
Thank you!
0. one
1. two
2. three
3. four
4. five
Random selection: two
Random selection: five
Random selection: one
Random selection: four
Random selection: one
Random selection: two
Random selection: three
frilled.cs.indiana.edu%
Do you see how that is achieved?

We then worked out a different problem:

#!/usr/bin/perl

srand;

while ($x = <STDIN>) {
  chop ($x);
  if ($x eq "bye") { print "Thank you!\n"; last; } 
  $dictionary{$x} = $dictionary{$x} + 1; 
} 

foreach $z (keys %dictionary) {
    print $z, " appears ", $dictionary{$z}, " times.\n"; 
} 
What is the difference between the two programs?

They're both from "Learning Perl", now on reserve at Swain.

Here's sample output the second program would produce:

frilled.cs.indiana.edu%chmod 755 two
frilled.cs.indiana.edu%./two
one
two
three
four
one
two
four
five
bye
Thank you!
one appears 2 times.
five appears 1 times.
three appears 1 times.
two appears 2 times.
four appears 2 times.
frilled.cs.indiana.edu%
Ask Sonal about these programs, she knows them inside out.

Well, OK - now let's move on.

(These exercises are good practice, though).

We will first look at the circle script from last time:

#!/usr/local/bin/perl
        
&ReadParse; 
 
&header("Lab 5 Circular Script");
 
if      ($ENV{REQUEST_METHOD} eq 'GET' ) {
  &printform; 
} elsif ($ENV{REQUEST_METHOD} eq 'POST') {
  &printform($in{count}); 
} 
 
&trailer; 
 
sub printform {
  local ($arg) = @_; 
  local $count = $arg + 1;
  print qq{
      <form method="POST" action="$ENV{SCRIPT_NAME}"> 
          Your call has number: <font size=+5>$count<font>. <p>  
      Press <input type="submit" value="here"> to call again. 
      <input type="hidden" name="count" value="$count"> 
      </form>
  }; 
} 
 
sub header {
    local ($t) = @_; 
  print "Content-type: text/html\n\n<html><head>";
  print "<title>$t</title></head><body bgcolor=white>\n"; 
} 
 
sub trailer {
  print "\n</body></html>"; 
}
 
sub ReadParse {
  local ($i, $key, $val) = @_; 
 
  if ($ENV{'REQUEST_METHOD'} eq 'GET' ) {
    $in = $ENV{'QUERY_STRING'}; 
 
  } elsif ($ENV{'REQUEST_METHOD'} eq 'POST') { 
    read (STDIN, $in, $ENV{'CONTENT_LENGTH'}); 
  } 
 
  @in = split(/&/, $in); 
 
  for ($i = 0; $i <= $#in; $i++) {
    $in[$i] =~ s/\+/ /g; 
 
    ($key, $val) = split(/=/, $in[$i]); 
 
    $key =~ s/%(..)/pack("c", hex($1))/ge; 
    $val =~ s/%(..)/pack("c", hex($1))/ge;     
 
    if (defined($in{$key})) {
      $in{$key} .= "\0";  
    } 
 
    $in{$key} .= $val; 
  }   
}
We then need to discuss ReadParse.

For this we need a review of patterns and regular expressions in Perl.

3. Regular expressions

A regular expression is a way of describing a set of strings without having to list all of the strings in the set.

We start from exact patterns, like the string foo, or abc and we introduce quantifiers: * and +.

A character followed by * describes a string of zero or more such characters. Thus

/aba/
refers to the pattern
aba
and
/ab*a/
refers to the pattern that starts with a, is followed by zero or more b's and ends with an a.

* specifies that the preceding character can appear zero or more times. + has a similar meaning, it says that the character appears at least once. * and + are two of a set of characters that have a special meaning and are therefore called metacharacters. They are listed below:

\ | ( [ { ^ $ * ? .
We'll mention two of them, ( and [, and then we'll move on.

( together with its associate ) can be used to capture and memorize the patterns that match. These patterns are being captured in special variables: $1, $2, $3, and so forth. The numbers represent the order of the parens in the pattern.

Example:

$x = "abbbc"; 
$x =~ /a(b*)c/; 
print $1; 
will print
bbb
In other words if the pattern specified inside the leaning toothpicks matches, then $1 (which is a special variable) immediately becomes whatever the parens are enclosing.

3.1 Classes of characters

The square bracket is used just as { and }'s are used in mathematics to denote sets, althought the notation is somewhat different.

[a-z] means one alphabetic lowercase character
[a-zA-z] means one alphabetic character
[0-9] means a digit
[a-zA-Z0-9_] is also shortened \w
[0-9] is also shortened \d
[^0-9] means anything but digit
[^\w] is also shortened \W
[ \t\r\n\f] is white space also shortened \s
  • \n is newline
  • \r is carriage return
  • \f is formfeed
  • \t is tab
  • there's a blank space ( ) at the beginning
4. Four examples

1. Here's a program that puts parens around a's in the strings that it receives from the command line.

tucotuco.cs.indiana.edu% cat sub
#!/usr/bin/perl
$ARGV[0] =~ s/(a)/($1)/g; 
print $ARGV[0], "\n"; 
tucotuco.cs.indiana.edu% ./sub abcdefghabcdefgh
(a)bcdefgh(a)bcdefgh
tucotuco.cs.indiana.edu% ./sub "abc def gha"
(a)bc def gh(a)
Note the use of double quotes to specify a string with blank spaces in it.

2. Here's another program that does the same thing with any alphabetic character:

tucotuco.cs.indiana.edu% cat sub1
#!/usr/bin/perl
$ARGV[0] =~ s/([a-zA-Z])/($1)/g;
print $ARGV[0], "\n"; 
tucotuco.cs.indiana.edu% ./sub1 "a1 bc3 4_&c +=m "
(a)1 (b)(c)3 4_&(c) +=(m) 
3. Here's a program that reads the index.html file and prints out the lines that have what looks like a hyperlink on them:
open (AB, "/u/dgerman/httpd/htdocs/index.html");
while ($x = <AB>) {
  if ($x =~ /<a href="([^"]+)">([^<]+)<\/a>/) {
    print $1; 
  } 
} 
close(AB); 
The two patterns in round parens are non-empty strings that will be stored in $1 and $2 after they match. The first one is a string that contains at least one character and does not contain double quotes. (This makes the pattern matching mechanism stop at the first " encountered double quote).

The second one describes a non-empty (+) string of characters that does not contain the < sign (which is where the description of the hyper-reference ends).

If you look close you will see outside these two patterns the clear structure of an

<a href="...">...</a>
tag, except we have put those two intimidating patterns where the ellipsis are.

4. Lines in access_log start like this:

129.79.207.219 - - [16/Sep/1999:01:29:37 
This can be described as follows:
^[\S]+ - - \[[^:]+:\d\d:\d\d:\d\d
that is: This is not the only possible description, just one that suits our purpose; having it we can collect this information to build a table of the number of hits, grouped by hour, for the server.

open (AB, "httpd/logs/access_log");
while ($x = <AB>) {
  if ($x =~ /^([\S]+) - - \[([^:]+:\d\d):\d\d:\d\d/) {
    $hits{$1} += 1; 
  } 
} 
close(AB);
The first pair of parens collects the IP number, the second one a date like this:
16/Sep/01
that means Sept 16, and the time 1am.

For each request to the server there is a line in the log file. Each line has the time of access. We basically count the lines (which stand for hits) and put them in bins, one such bin for each distinct hour of our server's life.

Next we will take one more look to HTTP, then present CGI.pm a module for CGI processing written by Lincoln Stein. After that we are ready to start server-side Java programming.

We'll compare CGI with server-side Java later.

Meanwhile it should be clear that a CGI calculator compares to

#!/usr/bin/perl
 
while ($x = <STDIN>) {
  ($com, $arg) = split(/ /, $x); 
  print "You have typed: $x"; 
  if ($com =~ /^bye/i) { print "Good-bye!\n"; exit; }
  elsif($com =~ /^add/i) { $acc += $arg; print "Acc is now $acc\n"; }
  elsif($com =~ /^sub/i) { $acc -= $arg; print "Acc is now $acc\n"; } 
  else { print "Acc stays $acc\n"; } 
}
as this one relates to it:
#!/usr/bin/perl

$acc = $ARGV[1];

if      ($ARGV[0] eq "add") { $acc += $ARGV[2];   

} elsif ($ARGV[0] eq "sub") { $acc -= $ARGV[2]; 

} else { }

print qq {
Acc is currently: $acc
calc (add|sub) $acc (value)
}; 
This would be an important milestone.
frilled.cs.indiana.edu%./calc
Acc is currently: 
calc (add|sub)  
frilled.cs.indiana.edu%./calc add
Acc is currently: 0
calc (add|sub) 0 
frilled.cs.indiana.edu%./calc add clear
Acc is currently: 0
calc (add|sub) 0 (value) 
frilled.cs.indiana.edu%./calc
Acc is currently: 
calc (add|sub)  (value) 
frilled.cs.indiana.edu%./calc add 0 4
Acc is currently: 4
calc (add|sub) 4 (value) 
frilled.cs.indiana.edu%./calc add 4 3
Acc is currently: 7
calc (add|sub) 7 (value) 
frilled.cs.indiana.edu%./calc sub 7 3
Acc is currently: 4
calc (add|sub) 4 (value) 
frilled.cs.indiana.edu%


Last updated on Jan 30, 2001, by Adrian German for A348/A548