In the excellent Coders at Work book, Doug Crockford advises programmers to rewrite their stuff every six months or so. He says rewrite, but I don’t think he actually means that. Developers love rewriting stuff and most of the time it’s absolutely pointless — I know, I’ve been there.

I think what he means is refactoring. Basically streamlining the good parts and getting rid of cruft. In an ideal world, refactoring can be done in small, atomic steps. It should create no or few incompatibilities. And most crucially, it shouldn’t in any way affect the product’s shipping date.

This past week I have given BarTab this treatment. Since its creation in late January, it has grown organically. After a few months of fixing bugs, adding features, releasing early and often, and observing it “in the wild,” it was time to step back and clean it up. And boy did it need cleaning up.

My precondition for this refactoring was that I wouldn’t add any new features. Nada. Zip. Even though it would’ve been very tempting at various stages. I did manage to fix a few lingering bugs, though. In the end I turned over almost every line of code, some even twice. It was a deeply satisfying experience, and I’m glad I stuck by my no-new-features rule. It wasn’t an ideal refactoring in the sense that it was completely backwards compatible. The old API was horrible, it is now much more symmetric and free of horrible puns.

So BarTab 2.0 (available now as beta) is leaner, meaner, less invasive (no eval() hacks!) and more compatible with other Firefox add-ons. In a lot of ways it’s the BarTab that I should always have written. But you know as well as I do, that’s not how it works. Very few people write perfect code the first time round.

To me, BarTab is the perfect example of why Release Early and Often and the occasional Refactoring works extremely well for small, self-contained pieces of code such as a library, plug-in or extension. It isn’t by far the first time I’ve done things this way, but it’s certainly turned out very nicely this time.

Advertisements

Tabs, tabs, tabs

May 1, 2010

Choosing the web browser as the application for most of your daily tasks means that it better well be usable. In my case that primarily means dealing with lots of tabs. So it happens that much of the time I’m spending hacking Firefox is on improving the tab story.

Lazy tab loading FTW

A few months ago I was fed up with the long (re)start times of a tab encumbered Firefox and came up with BarTab, a Firefox add-on that changes tab loading behaviour so that they’re loaded only when they’re accessed. As I wrote in a blog post then, it was initially intended for tab addicts like myself, but thanks to lots of feedback from all kinds of users — tab addicts and novices alike — BarTab has continually improved to serve related use cases as well.

Firefox after a restart. The three tabs on the left haven't been loaded yet, hence their dim appearance.

I’m really overwhelmed by BarTab’s popularity. It’s been reviewed by various online journals such as LifeHacker, scrutinized for its memory and CPU cycle conserving powers, featured on Rock Your Firefox and suggested for integration into Firefox proper. That’s a steep career for an add-on that’s barely three months old!

Making bookmarks obsolete

On the other hand, three months was enough time to take a step back, observe my own interaction with the browser and analyze the user feedback. I think there’s some real usability potential in lazy tab loading.

Novice users typically don’t keep a lot of open tabs. What many of them do is bookmark regularly visited sites. This behaviour is encouraged by the bookmark toolbar being visible by default in Firefox. However, with bookmark-related UI elements in the location bar, a toolbar, the sidebar, and the menubar, I’m sure you’ll agree that the usability of the current implementation could definitely be improved. Or in the words of a Reddit user:

“It’s like someone taped a bunch of cats together.”

I would even go as far as challenging the whole concept of bookmarks. I think they require too much user interaction. Tabs, on the other hand, are easy to open, close and discover. They’re a reasonably simple concept and one that’s part of the browser already anyway. When the brower supports lazy-loading tabs via a BarTab-like mechanism, tabs can essentially serve as bookmark substitutes. You simply remember a site by keeping its tab open. The difference to bookmarks is that you don’t need to do anything to “bookmark” them. In fact, you have to do one thing less: close them. Memory wastage won’t even be an issue with smart unloading. BarTab already supports this, albeit in a crude manner.

Improving tab discoverability

Eradicating bookmarks in favour of tabs will, at least to a certain degree, make everybody a “tab addict.” This has a few scalability implications. There’s of course the challenge of making the tab mechanism itself scale to hundreds, possibly thousands of tabs. That certainly is a tricky one to say the least, but I’ll leave that aside for now. Another real issue is the tab UI.

First there’s the tab arrangement. There are very good reasons for having a vertical arrangement to the side of the browsing window. Most computers have widescreen displays these days, so vertical screen real estate is more valuable than horizontal one. More to the point, a vertical arrangement provides more space for more tabs, so it scales better to a higher number of tabs. When I maximize Firefox on my laptop, I can see about 14 tabs in a horizontal and 40 in a vertical arrangement. That’s a factor of about 3 or half an order of magnitude more.

About 14 tabs are visible in the horizontal tab bar.

With a vertical tab arrangement one can overlook about 40 tabs.

Of course not everybody needs to or wants to see that many tabs. The novice users just have a few sites they visit regularly. In this case, one could still make good use of all that vertical space, for instance by showing thumbnails for each site. This scales well even to tens of tabs and helps users visually recognize their sites.

But even tab addicts might be interested just seeing a selection of all tabs available. The current tool of choice here is Tree Style Tab. It not only displays your tabs vertically, it also allows you to arrange them in arbitrary trees. In addition to that it has about a million other features. Don’t get me wrong, it’s a fabulous and brilliant add-on, but I think it’s a bit too complicated. And the complication comes at a price: Yes, thanks to BarTab restarting Firefox with about 100 tabs is quite fast — until you enable Tree Style Tab and have it restore your tree hierarchy.

This is why I’ve started work on Vertical Tabs (code on GitHub). Its proposed feature set is simple and much constrained: arrange tabs vertically in such a way that looks native on all platforms and allow grouping of tabs. In a second iteration there might be room for some additional group features such as previews and automatic unloading of tabs (via BarTab) when groups are collapsed. But all in all I’d like to keep it simple, fast and open to as many types of users as possible.

Tabs arranged with Tree Style Tab. A bit too much structure, methinks.

Tabs arranged in groups with Vertical Tabs.

Outlook to history

To sum up, a simple experiment with tabs has shown that the browser UI and is concepts can yet be improved. Merging tabs with bookmarks to a unified concept, both in terms of the UI as well as the underlying mechanics, is without doubt a challenge. But conceptually I think it makes a lot of sense.

What’s more, this way tabs become invariably linked to the browsing history. They essentially represent an open window or link to a certain item in that history. A loaded tab is an active link, an unloaded tab is a inactive link that just happens to be represented in the UI as a tab.

This, no doubt, should make us think about how to represent the browsing history as a whole in the UI. Of course the browsing history is riddled with information that you’re normally not interested in, so you need some sort of filter. This is what the tab bar could eventually become: a filered view on your browsing history.

Right now we’re seeing stuff happening in the webbrowser that starts to go beyond the traditional web application. I’m talking about things that make either the browser itself or the interaction between different web applications a richer experience.

On the browser side, the basket of technology commonly referred to as HTML 5 falls into this category. It makes the browser a much more powerful client, thus rebalancing a lot of weight between web server and client to an IMHO much more natural equilibrium.

On the web application side we are seeing are starting to see lots of interesting application mashups, thanks to technology like OAuth and cross-origin AJAX hacks. Facebook’s recent innovation, the social graph and the ability to embed it into other applications, is a powerful example.

As many have noted, there are huge problems with this and they all have to do with security and privacy. Virtually none of the technologies coming forward under the HTML 5 umbrella help with security. Yes there are some attempts to fix one of the most backwards security policies in the browser. But these are mere sticking plasters over oozing flesh wounds.

Weaving a richer web

As I’ve written here before, I think we need a browser designed for the cloud. Back then I was mostly speaking in terms of usability and explicitly ignored the privacy issue. It think it’s time to come back to that now. With HTML 5, we’re giving the browser many ways to be a richer client. But in terms of user data, we’re still treating it as a dumb terminal.

Weave, a Mozilla Labs project, has a different vision. Its broad idea is to create a richer web experience while still having users control their data. Concretely, the Weave Sync service allows you to synchronize your tabs, history, bookmarks, settings and passwords between different browsers. And here’s the thing: the data is all encrypted, on the client. All that Mozilla is storing on their storage nodes (which you don’t have to use, by the way) is a bunch of encrypted JSON.

Sure, you may be saying, that’s great for making Firefox a richer browser, but how does that help the general web?

Well, it turns out that doing RSA and AES cryptography in JavaScript isn’t such a far fetched idea at all. With some inspiration from an abandoned project, I was able to hack together a (very rough) version of Weave Sync for Google Chrome. Since it’s entirely written in JavaScript, it actually works in any browser.

See for yourself. (To try it out, you need to create a Weave account with some tab data in it, e.g. by installing the Weave Sync add-on in your Firefox and syncing your tabs).

Sure, you may be thinking, encryption is great for dealing with personal data, but it would be impossible in a social web. What if you wanted to share your bookmarks with other people?

Well, is it really that impossible? Let’s look at what Weave does. In a nutshell, it encrypts your private data with a symmetric “bulk key.” This bulk key is stored along with data on the server, but in encrypted form: encrypted with your public key. That means to get to your data you’ll need your private key to decrypt the bulk key which in turn can then decrypt your private data.

If I now wanted to share my bookmarks with you, I could simply give you my bulk key by encrypting it with your public key. Job done. You can see my data (and only the data I’ve encrypted with the particular bulk key that I’m sharing with you), but nobody else can. Not even Mozilla.

I know, sharing bookmarks is so 1998. But it’s essentially the same thing as the Like button (or the LUUV button). Or your address book. Or your whole social graph. Point is, we no longer need the server to do the heavy lifting for us because the browser environment is getting richer — be it the HTML templating, session storage or even cryptography. The server can become a dull data storage that we can scale the heck out of and, more crucially, potentially ditch for different one if you like. While all the data is in the client’s hands and leaves it only in encrypted form.

This is the kind of cloud I can definitely get on board with.

A few months ago I wrote a post about JavaScript titled Curly braces are not the problem wherein I pointed out one of JavaScript’s biggest weakness, the new operator and how to spell an object constructor as well as methods on the corresponding prototype. Some commentators mistook that post for critique of the prototype model itself. It was far from it, I think the prototype model is great, just the spelling was awful. Consider this:

function MyObject() {
    /* constructor here */
}
MyObject.prototype = {
    aMethod: function () {
        /* method here */
    }
};

which is alright until you now want to inherit from this and add methods:

function YourObject() {
    /* constructor here */
}
YourObject.prototype = new MyObject();
YourObject.prototype.anotherMethod = function () {
    /* another method here */
};

There are several problems with this. First of all because YourObject inherits from MyObject, it has to be spelled differently. Secondly, we can’t reuse the constructor, at least not without resorting to func.apply() tricks. Thirdly, we have to know what to pass to the constructor of MyObject at definition time.

It turns out, Doug Crockford not only agrees with me on this but also has come up with a better way. Back in January I thought that we needed more syntax to fix this, but it turns out we need less (by which I mean ditching the new statement). In Vol. III of his excellent Crockford on JavaScript lectures, he defines a constructor maker:

function new_constructor (extend, initializer, methods) {
    var prototype = Object.create(extend && extend.prototype);

    if (methods) {
        methods.keys().forEach(function (key) {
            prototype[key] = methods[key];
        });
    }

    var func = function () {
        var that = Object.create(prototype);
        if (typeof initializer === 'function') {
            initializer.apply(that, arguments);
        }
        return that;
    };

    func.prototype = prototype;
    prototype.constructor = func;
    return func;
}

I’ll let you work out the details of this yourself and instead just show you how you would define the equivalent of the two cases above:

var new_my_object = new_constructor(Object, function () {
    /* constructor here */
}, {
    aMethod: function () {
        /* method here */
    }
});

var new_your_object = new_constructor(my_object, function () {
    /* constructor here */
}, {
    anotherMethod: function () {
        /* method here */
    }
})

See how symmetrical both forms are now? And if both object constructors really were were to share the same initializer, I could easily define that as a separate function and reuse it.

Btw, if you do any sort of web development, I highly recommend you watch the Crockford on JavaScript talks. They’re not only entertaining but are an excellent lesson in history of all the technology that makes up the web.

Programs must be written for people to read, and only incidentally for machines to execute.

Abelson & Sussman, SICP

There’s a programming paradigm which I shall call, for the lack of a better name, bail out early. It’s so trivial that it almost doesn’t deserve a name, not to mention a blog post. Yet I often come across code that would be so much clearer if bail out early was used. Consider some code like this:

function transmogrify(input) {
    if (input.someConditionIsMet()) {
        var result;
        result = processInput(input);
        if (result) {
            return result;
        } else {
            throw UNDEFINED_RESULT;
        }
    } else {
        throw INVALID_INPUT;
    }
}

There’s a lot of if/else going on here. In the bail out early paradigm, you would try to write this without any else clauses. The trick is to sort out the problematic case first:

function transmogrify(input) {
    if (!input.someConditionIsMet()) {
        throw INVALID_INPUT;
    }

    var result;
    result = processInput(input);
    if (!result) {
        throw UNDEFINED_RESULT;
    }
    return result;
}

See how much flatter the structure of that program is? There are some other advantages:

  • Because you no longer put the main flow inside if statements, your programm is often easier to refactor. If for instance the sanity checks occur in multiple places of your program, you can simply factor them out into a utility function without messing up the structure of your code.
  • You generally have to indent less. And when you move code around, you have to reindent less — particularly pleasant when you’re coding in Python where indentation is significant.
  • In languages with curly braces like my fake JavaScript above, you don’t have to worry about blocks spanning many many lines, thus putting the opening and closing brackets so far a part that you no longer can tell what the closing bracket is actually closing.

When it comes down to economic principles, I’m quite market liberal. You know, I work and get paid for my work; the better I do my job and the more I work, the more money I make; and preferably nobody gets in my way of doing so. This is a neat idea that works on a microscopically but falls apart on several points when you take it to the macroscopic level. Non-linear motivation and the unjustified disproportionate income across professions are some examples. Social security is another. I shouldn’t be a big fan of welfare systems out of the aforementioned principles. But truth be told I’d rather live in a society where people care enough about each other to provide the essentials of living no matter what, than in one where people are driven into, say, crime just to survive. To paraphrase Wolfgang Grupp, I think of my social security taxes as contributions to a more crime-free, drug-free, poverty-free, police-free environment. Something that makes my own life more pleasant.

The same lesson can be applied to intellectual property and I can’t believe it’s taken me years to figure it out.

Five years ago I wrote a book. Back then Creative Commons was around already, and I knew about it because Mark Pilgrim had published his excellent Dive Into Python book under a CC license. I didn’t do the same for mine and I still can’t understand why. I had been an active OpenSource developer already at that time and I should have known about the benefits of open works. But instead I chose to look for a publisher. I’m very grateful Springer-Verlag took me on as an author but in retrospect I wish they had rejected me. It wasn’t money that I was after when looking for a publishing company—the royalties are pathetic. I went with a publisher because I wanted the book to be professional. I wanted it to have an ISBN and a pretty cover and a price tag and an entry in the Library of Congress. It was really my ego that wanted satisfaction.

Hey look at me, I’m a published author. And with Springer, no less.

Did it make me write a better book? No.

On the contrary. I had some of the best programmers and writers from the community review my book. But had I made my book open content from the start, I could have had essentially the whole community do reviews for me. So predictably, when the book went into print, it was still full of typos and errors. I made some money writing the book but had I set up a “Pay for an (already free) ebook” button on the website, I bet I would’ve made about the same amount of money. People still ask me whether it’s possible to purchase an ebook version. And who’s to say that a publisher still wouldn’t have decided to print it, even if it were available for free online? After all, Apress eventually printed Mark’s Dive Into Python.

Of course, having published a book did not only work out well for my ego, it also generated lots of business for me as a consultant and trainer. But would that have been different if the book had been CC-licensed? Very unlikely.

So here’s my argument: Getting paid for each copy of created works of intellectual property is a neat idea. This model does work and has its merits. But as both a creator and consumer, I would rather live in a society that allows content to be exchanged openly (at least when used non-commercially), than in one where copyright laws seem to benefit publishers and distributors more than creators. I don’t want to live in a society where copyright laws prohibit bands from playing their own music on their website, stop children from remixing popular culture and demand to cut someone’s internet connection because they’ve exchanged some copyrighted material with others. Don’t get me wrong, I’m not advocating piracy. But it’s a fact that the internet has changed the way creators and consumers can interact dramatically. The consumers have embraced the ways of the internet already, but many of the creators still haven’t. I hadn’t when I wrote my book, and it meant I produced an inferior work in an analogue world.

As Lawrence Lessig puts it in his excellent talks, we’re not going to be able to revert the internet revolution, we can only drive it underground. Society is changing because of the internet and we need all of society to adapt accordingly, including copyright laws. This is nothing we can do overnight. We must be patient. It took me a while “get it,” just imagine how long it’s going to take them.

While I love books — enough that I’ve written one myself — they’re often cumbersome to work with: finding things without a good index is very difficult, you can rarely take more than a few with you at a time and if it’s a particularly nice/expensive/rare one, you’d rather leave it in the shelf altogether.

The answer is, of course, to create a digital copy. One possible format for that would be PDF. Problem is: for its image data it has to resort to conventional compression algorithms. That means that scanned documents can turn out to be quite large. A file format that’s much more suited for this is DjVu. One of its tricks is a lossy algorithm that recognizes recurring shapes such as characters. As a result, DjVu encoded books are typically a quarter of the size of PDF encoded books.

Given the need to digitize a couple of books at work, I investigated whether it’s possible to create high-quality digital copies using freely available tools.

It’s not as easy as it sounds

If you already have a high-quality PDF document or a series of scanned images, there are a number of ways for you to end up with a decent DjVu document. The manual one involves calling the cjb2 command line tool from the DjVuLibre project, a more automatized one would be through the pdf2djvu tool. Problem is, when you scan a book, you rarely have high-quality scans to begin with. You typically have something like this:

Raw scan output

Fortunately there’s an excellent tool that can help here. It’s called unpaper and what it does is, among others, remove the ugly black borders and other noise, rotate the pages and split double pages in half. It works with PNM type images, so if your scanning program spits out a PDF, simply use ImageMagick to do the conversion. On a large document it’s most memory-efficient to make individual calls to the convert program, one per page:

for i in `seq 1 $NUMBER_OF_PAGES`; do
 convert -density 600 scan.pdf[`expr $i - 1`] pages`printf %03d $i`.pbm
done

This converts page N of the PDF to pages00N.pbm. Now unpaper can be invoked with the necessary options:

unpaper -v --layout double --pre-rotate -90 --output-pages 2 \
 pages%03d.pbm singlepages%03d.pbm

The result is separate image called singlepages00N.pbm that are nicely cleaned up:

Single page (left)Single page (right)

Extracting the text

At this point you might think that we’re done, given that cjb2 can easily convert the resulting pages to DjVu and djvm can create a multi-page document from them. However, the result wouldn’t be searchable for text, one of the reasons why one would want to digitize in the first place.

The solution here obviously is to apply some OCR technology. There are several free OCR tools available: tesseract, GOCR and ocropus. They all work more or less well, but ocropus has a trick up its sleave: It can not only extract the text from an image but also annotate the text with pixel coordinates. This means that a text search in a DjVu viewer will not only navigate to the right page but also to the right line (unfortunately, ocropus can’t resolve individual words, just lines). Installing ocropus on OS X is a bit of a pain in the neck, but if you follow these instructions to the word, it works. The following commands will then perform the OCR analysis:

ocropus book2pages outdir singlepages*.pbm
ocropus pages2lines outdir
ocropus lines2fsts outdir
ocropus fsts2text outdir
ocropus buildhtml outdir > hocr.html

As you can see, the result is an HTML file in the hOCR format. It contains the text gathered by ocropus in <span> elements, annotated with pixel information. In order to apply this information to DjVu documents, it needs to be transformed into a format that the DjVuLibre tools, specifically the djvused tool, understand. To do that, I hacked a little Python script together:

import sys
import os.path
from elementtree import ElementTree
from PIL import Image

hocrfile = sys.argv[1]
imgfiles = sys.argv[2:]

et = ElementTree.parse(hocrfile)
for page in et.getiterator('div'):
    if page.get('class') != 'ocr_page':
        continue

    if not imgfiles:
        continue
    imgfile = imgfiles.pop(0)

    txtfile = os.path.splitext(imgfile)[0] + '.txt'
    out = open(txtfile, 'w')

    image = Image.open(imgfile)
    print >>out, "(page 0 0 %s %s" % image.size

    for line in page:
        linetitle = line.get('title')
        if not linetitle.startswith('bbox '):
            continue
        x0, y0, x1, y1 = [int(x) for x in linetitle[5:].split()]
        imgheight = image.size[1]
        y0 = imgheight - y0
        y1 = imgheight - y1

        text = line.text.strip().replace('"', '\\"')
        print >>out, '  (line %s %s %s %s "%s")' % (x0, y0, x1, y1, text)

    print >>out, ")"
    out.close()

It’s evidently very crude and makes lots of assumptions specific to the ocropus output. For it to work you need the optional but fairly standard PIL and ElementTree packages installed. The script is invoked like so:

python hocl2djvu.py hocr.html singlepages*.pbm

It will spit out a singlepages00N.txt file for every page it finds text information for.

Putting it all together

Finally the image files for the individual pages can be converted to individual DjVu files:

for i in singlepages*pbm; do
    cjb2 -clean $i `basename $i pbm`djvu
done

Before combining the pages to a compound document, the djvused tool can then be used to apply the text annotations:

for i in singlepages*txt; do
    djvused `basename $i txt`djvu -e "select 1; set-txt $i" -s
done

Lastly, the following command creates the resulting book file:

djvm -c book.djvu singlepages*.djvu

And here’s what the result looks like:

DjVu text search

Conclusion

It’s easily possible to digitize books using free tools. Some rough edges remain, however. For instance, the unpaper program isn’t completely reliable. I haven’t fiddled with the settings yet, though, so perhaps the output can be improved. The same goes for the OCR machinery which still produces lots of erroneous words. Also, it’d be nice if the pixel annotations would work for individual words, too (like on Google book search). Perhaps a linear approximation could work — certainly seems feasible for monospace fonts.