I recently released a HTML to Google Code wiki converter. Now that the module is published on CPAN, it’s time to provide some usage instructions.
If you’re an experienced Perl hacker, see
perldoc HTML::WikiConverter
and
perldoc HTML::WikiConverter::GoogleCode.
If you’re not a Perl hacker, read on…
There are two ways to use the module; the easiest is via a shell script, html2wiki:
http://search.cpan.org/~diberri/HTML-WikiConverter/bin/html2wiki.
which is provided as part of the HTML::WikiConverter Perl module. The other option is to write your own Perl script and include the Google Code module. Using html2wiki is easier as you only have to supply the desired command line options. Writing your own script is the more flexible and powerful option. I’m going to cover the html2wiki option in detail, and then briefly cover an example Perl script.
Installation
Before you can use the converter, you’ll need to have Perl installed. I use ActiveState’s binary distribution for Windows; The installation instructions are here:
http://aspn.activestate.com/ASPN/docs/ActivePerl/5.10/install.html.
The you’ll need to install Perl modules HTML::WikiConverter and HTML::WikiConverter::GoogleCode. There are a couple of ways to install these modules; you can download the tar files from cpan.org, or use ppm if you have the ActiveState Perl distribution. My favorite method is to use the Perl CPAN module which is part of the core Perl distribution. The following shell commands should get you close:
>perl -MCPAN -e shell cpan> install HTML::WikiConverter CPAN: Storable loaded ok (v2.16) ... cpan> install HTML::WikiConverter::GoogleCode CPAN: Storable loaded ok (v2.16) ... cpan>
Line 1 invokes the CPAN module in an interactive mode. If you have any trouble, just type help at the cpan> prompt.
At this point, html2wiki should be installed. From a command prompt, you should be able to summon up the usage instructions:
>html2wiki
Usage:
html2wiki [options] [file]
Commonly used options:
--dialect=dialect Dialect name, e.g. "MediaWiki"
...
You should also be able to verify that the Google Code dialect is installed.
>html2wiki --list Installed dialects: GoogleCode ...
Using html2wiki
For this example, I put some random HTML in file named example.html:
>type example.html CamelCase words like JavaScript, <pre>JavaScript</pre> and <b>bold</b> words,and html tokens: 1 < 1
Using the Windows cmd.exe shell, the default conversion to wiki markup looks like this:
>html2wiki --dialect=GoogleCode < example.html
CamelCase words like JavaScript,
{{{
JavaScript
}}}
and *bold* words, and html tokens: 1 &amp;amp;lt; 1
The default conversion escapes HTML tokens such as < by replacing them with the HMTL escape sequence (<, in this case). The Google Code wiki will render < as < not what we want (<). To turn escaping off, set the -–no-escape-entities option:
html2wiki --dialect=GoogleCode ^
--no-escape-entities ^
< example.html
The output is now:
CamelCase words like JavaScript,
{{{
JavaScript
}}}
and *bold* words, and html tokens: 1 < 1
The next adjustment I make is to turn off the Google Code feature of automatically turning CamelCase words into wiki links. The documents I’ve been posting turn out to have many CamelCase words for most of which I do not want the automatic link. The wiki markup to disable link generation is to proceed the word by an exclamation mark. I’ve added an option to enable this feature for specific words. For example, to prevent link generation for the words CamelCase and JavaScript, modify the html2wiki command to be:
html2wiki --dialect=GoogleCode ^
--no-escape-entities ^
--escape-autolink=CamelCase ^
--escape-autolink=JavaScript ^
< example.html
The generated wiki markup from this command is:
!CamelCase words like !JavaScript,
{{{
JavaScript
}}}
and *bold* words, and html tokens: 1 < 1
In this case, the CamelCase words are escaped, except when the word occurs in within a <pre> tag (line 4).
The last Google Code feature of note is the ability to embed page summary and labels within the wiki markup. The page summary is a comment on the first line which is displayed on the project’s wiki index. Likewise, the labels markup element is a comment and is used by Google Code. For example, when the label ‘Featured’ is applied to a wiki page, a link to the page is created on the project’s front web page. The final example shows adding a summary and two labels to the wiki markup. The html2wiki command is:
html2wiki --dialect=GoogleCode ^
--no-escape-entities ^
--escape-autolink=CamelCase ^
--escape-autolink=JavaScript ^
--summary="This is a great page" ^
--labels=Featured ^
--labels=Phase-Deploy ^
< example.html
and the generated wiki markup is:
#summary This is a great page
#labels Featured,Phase-Deploy
!CamelCase words like !JavaScript,
{{{
JavaScript
}}}
and *bold* words, and html tokens: 1 < 1
Using H::WC::GoogleCode in a Perl Script
This last example shows a Perl script that makes use of the options show above. This example is a script I’ve used to convert a Developer’s Guide saved as HTML into wiki markup. The source code for the script follows below.
On line 5, the H::WC module is imported. I didn’t import the H:::WC::GoogleCode module since it is imported by H::WC based on the dialect name.
This document has a long list of CamelCase words for which I wanted to suppress generation of links. These words are collected into an array on line 7. Line 22 sets up a hash with keys of the path to the HTML document and values of the path where the generated wiki markup should be stored.
On line 26, and new instance of H::WC is created. Line 27 pulls in the H::WC::GoogleCode module. Line 28 passes the list of CamelCase words. Line 29 turns off escaping of HTML entities.
For images, the HTML document uses relative links to images stored on the file system (at ../docs/img). In order to support the wiki, I’ve staged the images on my web server under /img and passed in the web-site URI on line 30. The relative image links are turned into absolute URLs pointing to my web server in the wiki markup.
The final option, the summary parameter on line 30, generates a page summary on the wiki.
The balance of the script pulls in the HTML, converts it with the H::WC instance (line 42), and writes the wiki markup to a file.
#!/usr/bin/perl -w
package main;
use strict;
use HTML::WikiConverter;
my @toEscape = qw/
SqlMapClient
JBati
SqlMapClientBuilder
JavaScript
NovaJug
ExampleJSS
SqlLite
ExamplesJSS
SqlMap
MySql
SqlMapConfig
DevGuide
/;
my %to_process = qw /
..\docs\DevelopersGuide.html</pre>
/;
my $wc = new HTML::WikiConverter(
dialect => 'GoogleCode',
escape_autolink => \@toEscape,
escape_entities => 0,
base_uri => 'http://beavercreekconsulting.com',
summary => 'Developers Guide V0.2 - jBati usage and examples'
);
foreach my $in (keys %to_process) {
open(HTML, "<$in") or die "cannot open $in: $!\n";
my $html = do {local $/; <HTML>};
close(HTML);
open(WIKI, '>' . $to_process{$in}) or die "cannot open " .
$to_process{$in} . ": $!\n";
my $converted = $wc->html2wiki($html);
print WIKI $converted, "\n";
close(WIKI);
}
Great article, thank you!