PerlHoo README-linuxmafia
linuxmafia modifications v. 1.21
2003-12-11


This file describes the differences between the implementation of
PerlHoo used at linuxmafia.com for my "Linuxmafia.com Knowledgebase"
(http://linuxmafia.com/kb/) and the original PerlHoo implementation 
published by Jonathan Eisenzopf <eisen@pobox.com> in a series of three
articles at Mother of Perl, http://www.webreference.com/perl/tutorial/ 
(recommended reading).


Although Eisenzopf's PerlHoo is easily the most elegant solution I've
ever seen to the problem of information directories that need neither
database back-end storage nor multiuser controls and version tracking,
it has a couple of rough edges:

1.  The HTML perlhoo.pl produces for category index pages is mildly 
defective, lacking an SGML DTD, closing "body" and "html" tags, lacks
a "ul" pair to go with its "li" elements, and relies upon setting 
specific colours by their hexadecimal identities rather than using CSS.
It also incorrectly used a nested "p" and "h3" structure to attempt 
physical markup, and so I removed the "p" open and closing pair.
In essence, I've fixed the HTML so it meets modern standards and passes
the W3C validator.

2.  URLs of Eisenzopf's PerlHoo's index pages are always of the form 
http://hostname/cgi-bin/perlhoo.pl/* .  While this functions fine, it's
suboptimal, from several points of view:

   a)  Inclusion of "cgi-bin/perlhoo.pl/" in the URL makes it awkward and
   excessively long.  With some work, it can be reduced and simplified.

   b)  Google tends to not index as extensively Web-page trees it detects
   to be CGI-generated.

   c)  Particularly paranoid site-admins will prefer to not advertise
   their use of CGIs to attackers.

After a frustrating time trying to solve this problem using changes to
perlhoo.pl and Apache's Alias and ScriptAlias directives, I figured out
that the correct tool to use is Apache's mod_rewrite DSO.  

CAUTION:  mod_rewrite is black voodoo, capable of shooting large holes
in your feet.

You'll need something like this in your Apache httpd.conf's DSO section:

   LoadModule rewrite_module /usr/lib/apache/1.3/mod_rewrite.so

And you'll need this, somewhere in httpd.conf that applies it to the
Apache Location where your PerlHoo tree lives:

   RewriteEngine on
   RewriteRule ^/kb/(.*)$ /cgi-bin/perlhoo.pl/$1 [PT]
   Options +ExecCGI

The "PT" stands for "PassThrough", and ensures that control passes
through to the next module (DSO), which you want because cgi-bin is a
ScriptAlias, and not actually a directory under DocumentRoot.
Therefore, you also cannot have any aliases for your PerlHoo $baseurl 
directory (/kb/, in my system).

The Web location resolved to by PerlHoo's $rootdir pathspec (where
documents live) cannot be the same as the synthetic $baseurl location,
or it'll get overlaid and clobbered by PerlHoo's indexes.  So, I chose
to have $rootdir be "faq" under Apache's DocumentRoot, "/var/www/faq" on
my local filesystem.

Thus, in perlhoo.pl's "Constants" section:

    my $rootdir      = '/var/www/faq';

To adjust PerlHoo's "Home" to /kb/ , edit $baseurl in perlhoo.pl's 
"Constants" section like this:

    my $baseurl      = '/kb/';

...instead of...

    my $baseurl      = '/cgi-bin/perlhoo.pl';

Restart Apache, and you're (almost) done.  There remains one glitch:
Following the PerlHoo link from the PerlHoo http://hostname/kb/ "Home" 
to directory Foo, you arrive at URL http://hostname/faq//Foo/ instead 
of the correct http://hostname/faq/Foo/ .  This doubling of the
directory separator is only cosmetic, but is easily fixed.  In
perlhoo.pl's print_categories subroutine, edit this:

        my $url;
        if ($reldir =~ /\S+/) {
            $url = "$baseurl/$reldir/$thisdir";
        } else {
            $url = "$baseurl/$thisdir";
        }

to this:

        my $url;
        if ($reldir =~ /\S+/) {
            $url = "$baseurl$reldir/$thisdir";
        } else {
            $url = "$baseurl$thisdir";
        }

And you're done.  My modified perlhoo.pl and phadmin.pl scripts are in 
subdirectory "linuxmafia" in this tarball, for your convenience.


phadmin.pl now (a/o v1.21) is also believed to be valid HTML 4.01
Transitional, though the W3C validator complains about "entities" being 
not defined within URLs on the page.

-- Rick Moen, rick@linuxmafia.com
