The prototype for the homework is here.
It is probably useful to develop your homework in stages. The same can be said of any project, regardless of its size. Let's split it in stages.
1.1 Write a script that, when called with GET, produces the form displayed by the prototype. You can save the form in a file and then look at it with an editor or you can view the source for the HTML online. Make the script reply to POST by writing back a message that distinguishes between an invocation to summarize the log by dates or by hosts (depending on what radio button the user has pressed).1.2 Write a program that opens the
access_log
file and extracts (identifies and prints out, each on a new line) only the host information or the time information based on a switch that you specify on the command line (one of-host
or-day
). Use two functions for this, which are called from the main program depending on the value of the switch. This way you develop and test two functions to be used later and you need to wrap them in a main program with command line switches for testing purposes only.1.3 Augment the two functions such that the host or date occurrences are counted. Use a hash (associative array) for that, the strings that your program was listing at 1.2 will now be used to index the bins in the hash. At the end write out the contents of the corresponding associative array (either the one for hosts or the one for dates).
1.4 Make the program list the frequencies listed in the hashes in a certain order (ascending or descending). Now you're ready to use these functions in your script developed at 1.1 (simply call one or the other based on the radio button that the user has pressed).
At this point you're done. Let me know the URL or index it visibly on one of your pages on your server and send me a message so that I can look at it.
Hints for each of the stages are located at the end of the file.
2. Pattern Matching Exercises
2.1 Start from the program that reads from a file and prints its lines one by one.Change the program to print only non-empty lines.#!/usr/bin/perl open (AB, $ARGV[0]); while ($line = <AB>) { print $line; } close(AB);Identify the words that start with a and put/^\s*$/ # ignore lines with this pattern*( )*
around them.Replace all strings of numeric characters with <NUMBER>s/(a\w*)/*($1)*/g # parens group things on the left side # and place them in $1, $2, ... # parens are regular chars on the rightHave the program change all characters to uppercase.s/\d+/<NUMBER>/g;Change only sequences longer than 3 characters to uppercase.while ($line =~ /([a-z])/) { $lcletter = $1; $ucletter = uc $lcletter; $line =~ s/$lcletter/$ucletter/g; }while ($line =~ /([a-z]{3,})/) { $lcgroup = $1; # now uppercase lcgroup and # make the substitution as before }
1. Start from this programEvery time this program runs#!/usr/bin/perl %myHash = ( 'llama' => 'South America', 'camel' => 'Middle East', 'bobac' => 'Central Asia' ); foreach $key (keys %myHash) { print $key, "can be found in ", $myHash{$key}, "\n"; }myHash
is initialized and then listed.If we want to add a new animal/habitat pair to our knowledge base we need to change the program source code.
DBM files are ways in which we can keep associative arrays on disk (more permanent storage). To associate a DBM database with a DBM array (the hash associated with it) we use the
dbmopen
function:dbmopen(%ARRAYNAME, "dbmfilename", $mode)For example:
%ARRAYNAME
is the internal hash"dbmfilename"
is the external file name$mode
specifies the access modeassociates#!/usr/bin/perl dbmopen(%myHash, "habitats", 0644); # note: 0644 means rw-r--r-- # note the leading 0 in 0644 if ($ARGV[0] =~ /^add$/i) { # no ambiguity $myHash{$ARGV[1]} = $ARGV[2]; } else { print "Unsupported operation $ARGV[0]\n"; } dbmclose(%myHash);%myHash
withhabitats.*
on your disk (which are created if they don't exist already) with a umask of 644 (that gives read/write access to the owner and read access to group and world users).Look for
habitats.dir
andhabitats.pag
after you run this program.Extend this program to accept such commands as:
Assignment statements,
- add <animal> <habitat>
- delete <animal>
- search <animal>
- list
delete
, hash indexing andforeach
will be the ways in which you can implement them respectively.You can later extend your program to have the database passed as a parameter as well.
Hints for 1.1
Hints for 1.2#!/usr/bin/perl use CGI; $query = new CGI; print $query->header, $query->start_html; if ($query->request_method() eq 'GET') { &show_form; # shows the form } elsif ($query->request_method() eq 'POST') { &process_request; # makes distinction between # summaries by host or by date } print $query->end_html; sub show_form { # definition of show_form } sub process_request { # definition of process_request }
Look for patterns such as these:/^[\S]+/ # for hosts # that's: beginning of line (^) followed by non-space (\S) chars /\[\d+\/\w+\/1998/ for dates # day (digits) slash (protected with \) month (word chars) slash 1998Note: as pointed out the brackets in/^[\S]+/
are redundant so/^\S+/
should also work.You notice that we disambiguate significantly by assuming that our script will only summarize accesses that have been recorded in 1998.
Use parens to extract the strings of interest for later use in your associative arrays:
/^([\S]+)/ # for hosts /\[(\d+\/\w+)\/1998/ for dates
Hints for 1.3
It basically works like this: you're looking for strings that should match a specified pattern. Once you find a string that matches it, you have the string that matches in $1, because you use parens in the left-hand side of the substitution operator.Hints for 1.4You store this in a variable
$x = $1;and use that to increment with one the value that is associated in an associative array with that key$hash{$x} += 1;You can use one or two associative arrays to count the number of hits by date and by host, it's up to you.
Use thisRemember that you can develop this in a group or team but each member of the team needs to have the assignment installed on her/his server.
foreach $key (sort { $hash{$a} <=> $hash{$b} } (keys %hash)) { # print the entry }orforeach $key (sort my_routine (keys %hash)) { # print the entry } # ... and define my_routine as a sub sub my_routine { return $hash{$a} <=> $hash{$b}; }