What's Next
What It Is

pindex is a program inspired by the Un*x ptx command, and generates permuted indexes of text catalogues. I've been using it for some time as the backbone of my paper filing systems.


How I Use It

A very common way of filing (certainly the way I used to manage archive material I wanted to keep) is to create lots of folders or boxes with category labels on them.

This system works OK for those bits of filing - bank statements or utility bills, for instance - which are cohesive enough to be sensibly fileable by category, but I find that any other random bits of paper are just too multi-purpose to be pigeon holed in one box, and an approach based on permuted indexing gets around that quite effectively.

My filing system fits together as follows:

  1. indexed folders - each piece to be filed gets a unique ID based on the next free slot in a folder.

  2. index file - each piece gets a single-line entry in the index file. Each entry follows the format:

    <ID> <title and description>

    For example:

    A0050.1 Enterprise JavaBeans API specification (Java, EJB)
    A0053.1 If Operating Systems Were Airlines (joke)

    This demonstrates both my usual approach of putting keywords not present in the title in brackets afterwards, and my use of a <folder>.<item> notation in generating item IDs.

  3. generated index - every now and then, I run pindex to generate the permuted index for the index file. A permuted index is where all the words in all the lines are ordered alphabetically, with an indication of where they came from. The ptx program mentioned above has its own method of presentation, but the permuted index generated for the above example entries is as follows:

    Airlines      : If Operating Systems Were Airlines (joke)            > A0053.1
    API           : Enterprise JavaBeans API specification (Java, EJB)   > A0050.1
    EJB           : Enterprise JavaBeans API specification (Java, EJB)   > A0050.1
    Enterprise    : Enterprise JavaBeans API specification (Java, EJB)   > A0050.1
    If            : If Operating Systems Were Airlines (joke)            > A0053.1
    Java          : Enterprise JavaBeans API specification (Java, EJB)   > A0050.1
    JavaBeans     : Enterprise JavaBeans API specification (Java, EJB)   > A0050.1
    joke          : If Operating Systems Were Airlines (joke)            > A0053.1
    Operating     : If Operating Systems Were Airlines (joke)            > A0053.1
    specification : Enterprise JavaBeans API specification (Java, EJB)   > A0050.1
    Systems       : If Operating Systems Were Airlines (joke)            > A0053.1
    Were          : If Operating Systems Were Airlines (joke)            > A0053.1

    Each line is in the form:

    <keyword> : <title and description> > <ID>

    This can be printed out for reference next to the folder storage.

  4. finding a reference - assuming that you are looking for a keyword, use the permuted index by looking up the keyword you are interested in. This will give you all those items whose index entries contain that keyword.

    In the example above, I'm interested in all occurrences of the word "API". Looking this word up in the left-hand column I see that only the EJB spec refers to an API.

Note that the raw index file is itself often sufficient if you can use grep or some other search tool to look for words you're interested in.

What I do with a new piece of filing is:

  1. write a unique ID onto the piece to be filed. The ID comes from the next free slot in the filing cabinet.

  2. add a suitable entry to the index file.


Current Version

The current version is a Perl port of the original C program which was completed on 28 October 2000. See the table below for details.

We also have some useful example data files:

  • delim - delimiting characters for words

  • ignore - words which should be ignored when indexing.


Version History
Version Date Files Notes
0.3 28-October-2000 pindex Ported to Perl, and simplified the code radically since I could get rid of most of the stuff to deal with limited resources and let the Perl runtime deal with it instead.
0.2 4-August-1993 not published Ported to DOS in a still-16-bit world, so all the wacky code on the Arc came in very handy. Also added some more config options.
0.1 6-October-1992 no longer available Initial version, written in C on the Acorn Archimedes. Contained lots of wacky bits of code to deal with comparitively small RAM and low limits on open file descriptors.


What's Next

There's not much else I want to do with this program since it is functional as far as it goes. The only things I can think of at the moment are:

  • improve ignore/only processing - ignore and only sets currently are defined as literal strings. It would be more useful to define these as regular expressions.

  • make index file format user-defined - the existing index file format is fixed. This should be made definable via some sort of format string. The same, to a certain extent, goes for the output file.

I'm not likely to work on these any time soon, though.



This software copyright (c) Duncan Ellis 1992-2002.

This software is released under the GPL.


What's Next
Last updated 22-Sep-2005