Using staircase steps as drawers

What a brilliant space efficient idea, to use the steps in the stairs as drawers!

Feeling the middle-aged obsolescense

Fed up with my memory-starved laptop, I visited my regular computer parts chain. Having looked online to get an idea of prices, I was disconcerted when the sales rep offered a more expensive laptop memory chip. Looking into the inventory shelf at a cheaper price tag I asked, “What about this one?”

The amused response came back: “That’s for a desktop. I could sell you one if you want though.” For clarification, it’s pretty easy to see the difference between a laptop and desktop memory chip – the latter is twice as long.

Ashamed, I scurried away with the proper chip to the checkout counter. I’m going to have to turn in my geek certification soon…

Docbook to PDF for MoinMoin wiki

MoinMoin is a wiki engine that I frequently use and recommend, mostly because it is so easy to set up. It requires Python and that’s it – a simple web server is built-in, and it uses regular files to store data.

There are a couple of downsides to wikis, however:

  • Web access is needed to read the content.
  • If issued as a reference (e.g. standards document), the content might change at any time.

Fortunately, Moin supports rendering wiki pages in DocBook format, which can be transformed into PDF documents. After a few hours in the wee hours of the morning, I cobbled together a set of scripts and XSL transformations that generated reasonable looking PDF versions of the wiki content.

The basic steps:

  1. Perl script crawls (using wget) the wiki, looking for pages to include in the PDF.
  2. Run the pages that we want to keep through an XSLT process to convert each individual Docbook <article> into a <chapter> instead.
  3. Concatenate all the modified Docbook instances into a Docbook <book>.
  4. Run dblatex, an open source tool, that will convert Docbook instances into PDFs.

What I learned:

  • The Perl XML::XSLT module doesn’t support substr (or any?) functions. I ended up using Saxon instead.
  • A lot of minor tweaks were necessary to get the content in the right place for dblatex to render the PDF in a fashion that looked reasonable. e.g. stripping leading slashes, moving the page title from one XML subtree to another, changing <ulink> tags to <link> tags.
  • MoinMoin doesn’t like a User-Agent of known automated spidering tools. I had to set the User-Agent that wget advertised to a value Moin didn’t recognize.

Below are the scripts I put together (you need to download Saxon separately).

moinToPdf.pl

#!/usr/bin/perl

use strict;
use English;

die "Specify URL and name of book." if @ARGV < 2;

my $BASE_URL=$ARGV[0];
my $BOOK_NAME=$ARGV[1];
my %VISITED_URLS;
my $URL_SUFFIX = "?action=format&mimetype=xml/docbook";
my @SPIDER_URLS;
push @SPIDER_URLS, "/FrontPage";

my $TMP_DIR=`mktemp -d /tmp/moinToPdf-XXXXX`;
chomp $TMP_DIR;
#print STDOUT "$TMP_DIR\n";
#print STDOUT "---\n";

while (scalar @SPIDER_URLS > 0) {
  my @RELATIVE_URLS;
  my $WGET_STRING;

  while (scalar @SPIDER_URLS > 0) {
    my $RELATIVE_URL = pop @SPIDER_URLS;
    my $FULL_URL=${BASE_URL} . ${RELATIVE_URL} . $URL_SUFFIX;
    push @RELATIVE_URLS, $RELATIVE_URL;
    if (length $WGET_STRING > 0) {
      $WGET_STRING = $WGET_STRING . " ";
    }
    $WGET_STRING = $WGET_STRING . "\'${FULL_URL}\'";
#print STDOUT "$BASE_URL$RELATIVE_URL\n";
  }
#print STDOUT "---\n";

	`wget -P $TMP_DIR -q -U foobar ${WGET_STRING}`;

  while (scalar @RELATIVE_URLS > 0) {
    my $RELATIVE_URL = pop @RELATIVE_URLS;
    my $BASE_RELATIVE_URL = `basename "$RELATIVE_URL"`;
    chomp $BASE_RELATIVE_URL;
    $RELATIVE_URL = "/" . $BASE_RELATIVE_URL;

    my $TMP_FILE = $TMP_DIR . $RELATIVE_URL . $URL_SUFFIX;
    $TMP_FILE=~s/xml\/docbook/xml%2Fdocbook/;
#print STDOUT "$TMP_FILE\n";

    `grep -q '' '$TMP_FILE' 2>/dev/null`;
    if ($? != 0) {
#print STDOUT "SKIP: $TMP_FILE\n";
      next;
    }

    my $output=`java -jar saxon8.jar \'$TMP_FILE\' getWikiNames.xsl`;

    my $DOCBOOK;
    if ($RELATIVE_URL) {
      $DOCBOOK=substr($RELATIVE_URL,1) . ".xml";
    } else {
      $DOCBOOK="FrontPage.xml";
    }

    my $DIR=`dirname "$DOCBOOK"`;
    chomp $DIR;
    `mkdir -p "$DIR"`;

    `java -jar saxon8.jar -o "$DOCBOOK" "$TMP_FILE" transformArticle.xsl`;

    foreach (split '\n', $output) {
      chomp $_;
      /.*url="([^"]+)".*/;
      my $NEW_URL=$1;
      /
(.*)<\/ulink>/;
      my $NAME=$1;
      if ($NEW_URL=~/action=AttachFile.*do=get.*/) {
        my $FILE_URL = "${BASE_URL}${NEW_URL}";
        my $FILE_NAME = "$FILE_URL";
        $FILE_NAME=~s/.*target=(.*)/\1/;
        `wget -q -U foobar -O "$FILE_NAME" "$BASE_URL$NEW_URL"`;
      } elsif (
        $NEW_URL=~/^\/.*/
        and not defined $VISITED_URLS{$NEW_URL}
        and not $NEW_URL=~/^\/OtherUser/
        and not $NEW_URL=~/^\/HelpOn/
        and not $NEW_URL=~/^\/Category/
        and not $NEW_URL=~/^\/SystemPages/
        and not $NEW_URL=~/^\/MoinMoin/
        and not $NEW_URL=~/^\/WhyWikiWorks/
        and not $NEW_URL=~/^\/RecentChanges/
        and not $NEW_URL=~/^\/WikiCourse/
        and not $NEW_URL=~/^\/AutoAdminGroup/
        and not $NEW_URL=~/^\/HelpContents/
        and not $NEW_URL=~/^\/HelpMiscellaneous/
        and not $NEW_URL=~/^\/WikiWikiWeb/
        and not $NEW_URL=~/^\/SiteNavigation/
        and not $NEW_URL=~/^\/RandomPage/
        and not $NEW_URL=~/^\/WantedPages/
        and not $NEW_URL=~/^\/WordIndex/
        and not $NEW_URL=~/^\/FindPage/
        and not $NEW_URL=~/^\/WikiName/
        and not $NEW_URL=~/^\/InterWiki/
        and not $NEW_URL=~/^\/TitleIndex/
        and not $NEW_URL=~/^\/SyntaxReference/
        and not $NEW_URL=~/^\/HelpIndex/
        and not $NEW_URL=~/^\/HelpForBeginners/
        and not $NEW_URL=~/^\/WikiSandBox/
        and not $NEW_URL=~/.*action=AttachFile.*/
      ) {
        $VISITED_URLS{$NEW_URL} = 1;
        push @SPIDER_URLS, $NEW_URL;
      }
    }
  }
}

#`rm -rf $TMP_DIR`;

my $BOOK_DATE=`date +%Y%m%d%H%M`;
chomp $BOOK_DATE;
my $BOOK_TMP="${BOOK_NAME}.${BOOK_DATE}.tmp";
my $BOOK="${BOOK_NAME}.${BOOK_DATE}.xml";

#print STDOUT "BookDate: $BOOK_DATE";
#print STDOUT "BookInterim: $BOOK_TMP";
#print STDOUT "Book: $BOOK";

my $FILE;
open FILE, ">$BOOK_TMP";
print FILE '';
print FILE '';
foreach (`find .  -name '*.xml'`) {
  chomp $_;
  my $FILE2;
  open FILE2, "$_";
  while() {
    s/<\?xml[^>]+\?>//;
    print FILE $_;
  }
  close FILE2;
}
print FILE '';
close FILE;

`java -jar saxon8.jar -o $BOOK $BOOK_TMP polishBook.xsl`;
`dblatex $BOOK`;
`gzip $BOOK`;
`rm -f $BOOK_TMP`;

getWikiNames.xsl

<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
  <!-- Stylesheet finds the WikiName links and spits it out for the perl script to determine the next URL. -->
  <xsl:template match="ulink">
                <xsl:copy-of select="."/><xsl:text>
</xsl:text>
  </xsl:template>
        <xsl:template match="/">
                <xsl:apply-templates select="//ulink"/>
        </xsl:template>
</xsl:stylesheet>

transformArticle.xsl

<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <!-- Default copy rules. -->
  <xsl:template match="text()">
    <xsl:value-of select="."/>
  </xsl:template>
  <xsl:template match="@*">
    <xsl:copy-of select="."/>
  </xsl:template>
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <!-- Converts articles into sections. -->
  <xsl:template match="/article">
    <section>
      <xsl:attribute name="id">
        <xsl:value-of select="articleinfo/title"/>
      </xsl:attribute>
      <xsl:apply-templates/>
    </section>
  </xsl:template>
  <xsl:template match="/article/section">
    <xsl:apply-templates/>
  </xsl:template>

  <!-- Strip out <articleInfo/> as it's not needed for <section/> or <chapter/> -->
  <xsl:template match="/article/articleinfo"/>

  <!-- Tables require IDs, and won't render to PDF properly without them.  So use informal tables instead. -->
  <xsl:template match="table">
    <informaltable>
      <xsl:copy-of select="@*"/>
      <xsl:for-each select="*">
        <xsl:choose>
          <!-- Sets the column count for tables, to avoid dblatex warning messages. -->
          <xsl:when test="name() = 'tgroup'">
            <tgroup>
              <xsl:attribute name="cols" select="count(colspec)"/>
              <xsl:copy-of select="@*"/>
              <xsl:for-each select="*">
                <xsl:copy>
                  <xsl:apply-templates/>
                </xsl:copy>
              </xsl:for-each>
            </tgroup>
          </xsl:when>
          <xsl:otherwise>
            <xsl:copy>
              <xsl:apply-templates/>
            </xsl:copy>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </informaltable>
  </xsl:template>

  <!-- Fix up any WikiName references to be rid of the leading slash. -->
  <xsl:template match="ulink">
    <xsl:choose>
      <xsl:when test="inlinemediaobject/imageobject/imagedata">
        <xsl:variable name="filename" select='replace(@url,".*target=","")'/>
        <inlinemediaobject>
          <imageobject>
            <imagedata>
              <xsl:attribute name="fileref">
                <xsl:value-of select="$filename"/>
              </xsl:attribute>
            </imagedata>
          </imageobject>
          <textobject>
            <phrase>
              <xsl:value-of select="$filename"/>
            </phrase>
          </textobject>
        </inlinemediaobject>
      </xsl:when>
      <xsl:otherwise>
        <xsl:choose>
          <xsl:when test="substring(@url,1,1) = '/'">
            <link>
              <xsl:attribute name="linkend">
                <xsl:value-of select="substring(@url,2)"/>
              </xsl:attribute>
              <xsl:apply-templates/>
            </link>
          </xsl:when>
          <xsl:otherwise>
            <xsl:copy>
              <xsl:copy-of select="@*"/>
              <!--<xsl:value-of select="@url"/>-->
              <xsl:apply-templates/>
            </xsl:copy>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

polishBook.xsl

<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>
  <xsl:template match="/sections">
    <book>
      <xsl:apply-templates>
        <xsl:sort select="attribute::id" />
      </xsl:apply-templates>
    </book>
  </xsl:template>

  <!-- Converts each section into a chapter. -->
  <!-- Uses existence of '/' in @id as an indicator of nesting -->
  <xsl:template match="/sections/section[not(contains(@id,'/'))]">
    <xsl:variable name='sectionId' select='@id'/>
    <chapter>
      <xsl:copy-of select="@*"/>
      <xsl:for-each select="*">
        <xsl:copy>
          <xsl:apply-templates/>
        </xsl:copy>
      </xsl:for-each>
      <xsl:for-each
      select="/sections/section[starts-with(@id,concat($sectionId, '/'))]">
        <section>
          <xsl:copy-of select="@*"/>
          <xsl:for-each select="*">
            <xsl:copy>
              <xsl:apply-templates/>
            </xsl:copy>
          </xsl:for-each>
        </section>
      </xsl:for-each>
    </chapter>
  </xsl:template>

  <!-- These three templates are the default copy rules. -->
  <xsl:template match="text()">
    <xsl:value-of select="."/>
  </xsl:template>
  <xsl:template match="@*">
    <xsl:copy-of select="."/>
  </xsl:template>
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="*|@*|text()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Strips out any of the Category* pages. -->
  <xsl:template match="section">
    <xsl:choose>
      <xsl:when test="substring(@id,1,8) = 'Category'"/>
      <xsl:otherwise>
        <xsl:copy>
          <xsl:copy-of select="@*"/>
          <xsl:apply-templates/>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- Strips out any links to CAtegory*, and transform dangling WikiName references to special text. -->
  <xsl:template match="link">
    <xsl:variable name="end">
      <xsl:value-of select="@linkend"/>
    </xsl:variable>
    <xsl:choose>
      <xsl:when test="//section[@id=$end]">
        <xsl:copy-of select="."/>
      </xsl:when>
      <xsl:when test="substring($end,1,8) = 'Category'"/>
      <xsl:otherwise>
        <emphasis role="italics"><emphasis role="underline">
          <xsl:value-of select="text()"/>
        </emphasis></emphasis>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>
Follow

Get every new post delivered to your Inbox.