I don’t really remember when they switched it on, but for a few days now I’ve been noticing Twitter has been tracking what links you click on. When you log onto twitter and click on a link somebody has posted in a tweet, instead of being taken to the location right away, it first takes you to Twitter’s link counter.

For instance, take this tweet by Stephen Fry:

stephenfry Autumnal miniblog http://bit.ly/3KBonC

When clicking on the link, you will first be taken to this location:

http://twitter.com/link_click_count?url=http%3A%2F%2Fbit.ly%2F3KBonC
&linkType=web&tweetId=3860514781&userId=15659603
&authenticity_token=dd850f0d1631afa8f8b8a22d9b070366a2bc7373

As you can see, this URL contains a number to identify the tweet that the URL appeared in as well as an ID to identify you as a user. In other words, they might not only count how often a link has been clicked but also who clicks on what.

What I find fascinating (in a bad way) about this is that firstly they never announced this new “feature” and secondly that they’re being secret about it. Because at first sight, the links in tweets seem immaculate. They actually inject this behaviour using JavaScript, so that when you click on a link you’re taken to http://twitter.com/link_click_count first. The code in question is loaded from http://a1.twimg.com/a/1252448032/javascripts/twitter.js. It’s all cramped into the last line, I’ve cleaned it up here for your convenience:

twttr.countClick = function(){
    var A = twttr.createTrackingParameters(this);
    if (A.linkType=="hashtag") {
        twttr.ajaxClickCount(A)}
    else {
        twttr.redirectToTracker(this,A)
    }
    setTimeout(twttr.resetLink,100,this);
    return true};

twttr.resetLink = function(B){
    var A = $(B).attr("original-href");
    if (A) {
        B.href=A
    }
};

twttr.ajaxClickCount = function(A){
    jQuery.get("/link_click_count",A)
};

twttr.redirectToTracker = function(A,B){
    $(A).attr("original-href",A.href);
    A.href = "/link_click_count?"+$.param(B)
};

twttr.createTrackingParameters=function(F){
    var B=$(F);
    var A=function(){
        var K=B.attr("class");
        var I=["hashtag","profile-pic","username","web"];
        for(var J in I){if(K.indexOf(I[J])!==-1){return I[J]}}
    }();
    var E=B.closest(".status").find(".meta").children("a").get(0).href.split("/");
    var G=E[E.length-1];
    var H=$('meta[name="session-userid"]');
    var D=H.attr("content")||-1;
    var C=twttr.form_authenticity_token||$('input[name="authenticity_token"]').attr("value");
    return{url:F.href,linkType:A,tweetId:G,userId:D,authenticity_token:C}
};

$(document).ready(function(){
    var A=$("#content a.tweet-url");
    A.live("click",twttr.countClick)}
);

If you’re JavaScript-literate, you’ll see that the last block activates the behaviour by having the functions above called whenever a link in a tweet is clicked.

I’m pretty sure I don’t like my link clicking behaviour logged. Fortunately you can stop it by switching off JavaScript. Or not using the Twitter web interface. Or using a free Open Source alternative such as Identi.ca.

While I love books — enough that I’ve written one myself — they’re often cumbersome to work with: finding things without a good index is very difficult, you can rarely take more than a few with you at a time and if it’s a particularly nice/expensive/rare one, you’d rather leave it in the shelf altogether.

The answer is, of course, to create a digital copy. One possible format for that would be PDF. Problem is: for its image data it has to resort to conventional compression algorithms. That means that scanned documents can turn out to be quite large. A file format that’s much more suited for this is DjVu. One of its tricks is a lossy algorithm that recognizes recurring shapes such as characters. As a result, DjVu encoded books are typically a quarter of the size of PDF encoded books.

Given the need to digitize a couple of books at work, I investigated whether it’s possible to create high-quality digital copies using freely available tools.

It’s not as easy as it sounds

If you already have a high-quality PDF document or a series of scanned images, there are a number of ways for you to end up with a decent DjVu document. The manual one involves calling the cjb2 command line tool from the DjVuLibre project, a more automatized one would be through the pdf2djvu tool. Problem is, when you scan a book, you rarely have high-quality scans to begin with. You typically have something like this:

Raw scan output

Fortunately there’s an excellent tool that can help here. It’s called unpaper and what it does is, among others, remove the ugly black borders and other noise, rotate the pages and split double pages in half. It works with PNM type images, so if your scanning program spits out a PDF, simply use ImageMagick to do the conversion. On a large document it’s most memory-efficient to make individual calls to the convert program, one per page:

for i in `seq 1 $NUMBER_OF_PAGES`; do
 convert -density 600 scan.pdf[`expr $i - 1`] pages`printf %03d $i`.pbm
done

This converts page N of the PDF to pages00N.pbm. Now unpaper can be invoked with the necessary options:

unpaper -v --layout double --pre-rotate -90 --output-pages 2 \
 pages%03d.pbm singlepages%03d.pbm

The result is separate image called singlepages00N.pbm that are nicely cleaned up:

Single page (left)Single page (right)

Extracting the text

At this point you might think that we’re done, given that cjb2 can easily convert the resulting pages to DjVu and djvm can create a multi-page document from them. However, the result wouldn’t be searchable for text, one of the reasons why one would want to digitize in the first place.

The solution here obviously is to apply some OCR technology. There are several free OCR tools available: tesseract, GOCR and ocropus. They all work more or less well, but ocropus has a trick up its sleave: It can not only extract the text from an image but also annotate the text with pixel coordinates. This means that a text search in a DjVu viewer will not only navigate to the right page but also to the right line (unfortunately, ocropus can’t resolve individual words, just lines). Installing ocropus on OS X is a bit of a pain in the neck, but if you follow these instructions to the word, it works. The following commands will then perform the OCR analysis:

ocropus book2pages outdir singlepages*.pbm
ocropus pages2lines outdir
ocropus lines2fsts outdir
ocropus fsts2text outdir
ocropus buildhtml outdir > hocr.html

As you can see, the result is an HTML file in the hOCR format. It contains the text gathered by ocropus in <span> elements, annotated with pixel information. In order to apply this information to DjVu documents, it needs to be transformed into a format that the DjVuLibre tools, specifically the djvused tool, understand. To do that, I hacked a little Python script together:

import sys
import os.path
from elementtree import ElementTree
from PIL import Image

hocrfile = sys.argv[1]
imgfiles = sys.argv[2:]

et = ElementTree.parse(hocrfile)
for page in et.getiterator('div'):
    if page.get('class') != 'ocr_page':
        continue

    if not imgfiles:
        continue
    imgfile = imgfiles.pop(0)

    txtfile = os.path.splitext(imgfile)[0] + '.txt'
    out = open(txtfile, 'w')

    image = Image.open(imgfile)
    print >>out, "(page 0 0 %s %s" % image.size

    for line in page:
        linetitle = line.get('title')
        if not linetitle.startswith('bbox '):
            continue
        x0, y0, x1, y1 = [int(x) for x in linetitle[5:].split()]
        imgheight = image.size[1]
        y0 = imgheight - y0
        y1 = imgheight - y1

        text = line.text.strip().replace('"', '\\"')
        print >>out, '  (line %s %s %s %s "%s")' % (x0, y0, x1, y1, text)

    print >>out, ")"
    out.close()

It’s evidently very crude and makes lots of assumptions specific to the ocropus output. For it to work you need the optional but fairly standard PIL and ElementTree packages installed. The script is invoked like so:

python hocl2djvu.py hocr.html singlepages*.pbm

It will spit out a singlepages00N.txt file for every page it finds text information for.

Putting it all together

Finally the image files for the individual pages can be converted to individual DjVu files:

for i in singlepages*pbm; do
    cjb2 -clean $i `basename $i pbm`djvu
done

Before combining the pages to a compound document, the djvused tool can then be used to apply the text annotations:

for i in singlepages*txt; do
    djvused `basename $i txt`djvu -e "select 1; set-txt $i" -s
done

Lastly, the following command creates the resulting book file:

djvm -c book.djvu singlepages*.djvu

And here’s what the result looks like:

DjVu text search

Conclusion

It’s easily possible to digitize books using free tools. Some rough edges remain, however. For instance, the unpaper program isn’t completely reliable. I haven’t fiddled with the settings yet, though, so perhaps the output can be improved. The same goes for the OCR machinery which still produces lots of erroneous words. Also, it’d be nice if the pixel annotations would work for individual words, too (like on Google book search). Perhaps a linear approximation could work — certainly seems feasible for monospace fonts.

I dig git

February 13, 2009

Last Christmas I investigated some distributed revision control systems (so that I could keep on working normally despite the spotty internet connection over the holidays). Since then I’ve been using git on a daily basis for my research work and just love it. Thanks to git-svn I can still keep the original SVN backend. This has a number of advantages (sorry for reiterating a couple of points here):

  • I access my SVN repositories via HTTPS. That means they’ll work even in the most ridiculously firewalled network as long as HTTPS connections are allowed. It also means that I can browse files with just a regular web browser (and nowadays even my phone has a web browser built-in!)
  • While git’s Windows support is said to be getting better (I haven’t tried), SVN still has the best interoperability. When you’re not 100% sure what platforms have to access and commit to the project, SVN still is the best common denominator.
  • Even with git you’ll sooner or later want to create a certain central repository that you or others can push to and pull from. Like with SVN, that too will require the server to have git installed, by the way. So that repo might just as well be in SVN, given the two points above.

So, I’m still using SVN for the backend and for the frontend where necessary. Otherwise I use git. I’m probably only using 10% of its features (which is fine by me), but boy I’m sold:

  • git’s really well documented, I particularly like the insightful tips from gitready.com but the man pages and the online user manual are just as good. Compared to that, the Mercurial book is close to being useless.
  • It simply makes sense. Lots of people say git is weird and compared to SVN, it sometimes really is. But the more time I spend using git, the more I feel that this is the way it should be done. Mercurial is conceptually very similar to git and both are increasingly popular which to me seems to indicate that both must be doing something right.

One thing that falls under the “the way it should be done” category is that you can just walk up to a directory and decide you want this stuff to be version controlled. The other day I realized I wanted to extend a piece of software. I couldn’t find out whether it was still maintained and where, so I just turned the directory into a git repository and started hacking away. Maybe it is still maintained, in which case I can always ask git to spit out my modifications as patches. Or the original author could pull my changes from the public repository I’ve set up. Come to think of it, I should probably tell the original author that :)

You may have noticed there, I’ve simply placed a bare git repository on a web server. I love how easy and low-tech this felt: just scp’ed it over. Just to make it a bit user friendly, I added an index.html explaining what it is. I wonder how other people deal with this. I suppose folks without SSH access to webspace prefer sites like github.com and bitbucket.org? What do you do?

What’s nice about Subversion is that it’s easy enough for most people to wrap their head around it and therefore it’s supported well. Admittedly it can make branching a pain. Actually, that isn’t quite true. It’s a bit like speed which itself doesn’t kill you, just the sudden impact does. Likewise he problem is not with branching, it’s the merging that can make you pull your hair out. But it’s gotten better at that over the past releases.

Another issue with Subversion is the central repository. I however think the central repository idea fits many projects or even individuals in need of revision control very well. And if you want to mirror a repository for bandwidth or high-availability reasons, well that’s possible too since version 1.4 or so. I know, these mirrors have to be read-only, otherwise it’ll easily get messy. If you want to be able to commit to some local mirror and push back your changes, you should consider SVK. It mirrors foreign repositories locally, lets you create local branches and merge them back. SVK has a few problems of its own, but I’m not going to get into them here. Point is, if your project works best so that there’s a central repository, Subversion is a sensible choice. Thanks to solutions like SVK, people will also be able to work offline (I’ve used it this way) or be able to follow a project and add their modifications without leaving their cave.

Distribution is not for everyone

Since lately people have been trying to convince us that central repositories are not the way of the future, and neither is Subversion because it’s not suitable at all in a distributed environment. But in all honesty, after many years of contributing to various Open Source projects I haven’t actually had the need for such an environment. I get Linus’ points about how the Linux Kernel is developed, how he receives patches from his lieutenants who in turn receive them from somebody else. Surely all that needs a distributed system. But I’m sorry, the projects I’ve worked on just don’t have the man power to have people do nothing else but review patches sent to them, merge them with their private branches, push them along to other people, etc. In our central repositories we’re happy with having a few knowledgeable people watch the commit list (we call them the “checkin police” in Zope) and make sure that code and patches committed as well as the log messages match our quality standards. For everything else there’s buildbot.

All that’s not to say that distributed revision control systems aren’t nice because they are. I do understand why their users are smug. But I just won’t have their arrogance. Subversion has served many of us well over the years and now all of a sudden we’re idiots if we still like it?

That said, all distributed systems can do what Subversion can do (except partial checkouts of the repository apparently) so they seem worth a look. After all you get more features and no drawbacks, right?

Sorting out the contenders

There seem to be three serious options when it comes to distributed version control that all have about the same feature set: Bazaar, Mercurial and Git. Actually, that’s not quite correct. If you’re like me and have to or want to work with several Subversion repositories, Mercurial isn’t an option. As nice as Mercurial may seem (though a bit weird in its understanding of branches), you’ll have to realize that only Bazaar and Git have decent Subversion plugins that allow you to pull and push to a Subversion repository.

Both Bazaar and Git are installed easily on OS X if you have MacPorts. Just beware that if you want to build Git with SVN support, you should install it as follows:

sudo port install git +svn

For the following tests I took one of my private Subversion repositories (the smallest one in which I keep all the files related to my work at the university) and tried interacting with it from Bazaar and Git.

Bazaar

The first thing you’ll hear about Bazaar is that while the documentation is pretty good, it’s slow. And boy they aren’t lying. I haven’t done any measurements, but it felt even slower than Subversion on operations like printing the status of the working copy.

Like with Mercurial, Bazaar’s command set is quite easy to grasp for people who’ve been brainwashed by Subversion. In other words, people like me. However, I can’t help but think that’s because Bazaar isn’t actually too far away from Subversion, conceptually speaking. Sure, it has local repositories and all that, but in essence it seems to be geared towards a central repository when it comes to sharing your work. Not that that’s a bad thing, as I tried to point out above. I just somehow expected more. For instance, let’s say you have a checkout of something. This checkout can only be bound to one branch in a remote repository at a time. That means you can’t push some work to several repositories at the same time.

Like SVK, Bazaar has the concept of a working copy directly associated with a remote branch and working copies that represent local branches. If you have one of the former, the unbind/bind feature is quite neat. It tells Bazaar to temporarily stop sending every commit to the remote repository (e.g. while you’re hacking away on the train). Once you’re back with network connectivity, you rebind to the remote branch and can push your changes. Unfortunately, Bazaar wants you to push all these changes as one revision (to Subversion) even if you made several commits when offline. I’d rather have it reflect the individual commits.

Another downside in the communication with a Subversion repository is that it leaves turds in the repository, that is special directory properties that it uses to track which revision it has synced. In this respect it’s similar to SVK.

Git

Having read tutorials and guides a la “Git for SVN refugees”, I must get the impression that Git is surrounded by a cloud of fanboyism. Fanboyism per se is tolerable, but as I said above, I don’t like when it’s mixed with arrogance. I know that Subversion isn’t the bee’s knees, that’s why I’m reading this tutorial. You don’t have to tell how stupid I’ve been using Subversion all along and not helping Linus come up with Git.

That said, once you look past the fanboyism, you’ll realize Git is actually quite well composed. The commands are a bit weird at first but so far each one has done exactly what I expected it to do. Its concept of remote and local branches is absolutely easy to understand and since Linus designed it to do kernel development, you can easily manage a gazillion local and remote branches, do merges between them, etc. I’ll admit that it feels a bit weird in the beginning, but you’ll soon appreciate the niftiness.

Something that definitely takes getting used to is the way it represents branches. A checkout and a repository are inseparably the same thing, therefore switching between branches happens within the same checkout. I’m not yet entirely sure yet what to think of that, all I know is that you might easily forget which branch you’re currently in and do something to a branch that you meant to do in another one. That’s not a big problem with Git, though, because you can easily roll back commits. What is annoying, however, is that you can’t switch branches or rebase your changes on top of the latest changes from the repository you’re tracking (e.g. SVN) while having local modifications. I tend to keep local modifications in my working copy almost forever, for instance when I have a canonical version of a configuration file in the repository and I change it locally for a test installation. Git has ways around that annoyance, too, for instance I could use git stash to hide the local modifications temporarily, or I could make a local branch in which I can check in the modifications but never push them back to the tracked repository, just pull the latest changes.

Git’s Subversion integration is superb. There’s an excellent tutorial for people who’ve deserted from Subversion/SVK to Git. It also mentions how to interact with a Subversion repository. In fact, generally you read that Git’s supposedly not as well documented as, say, Bazaar. I can’t come to that conclusion. I’ve rarely needed the online user guide, the man pages are quite well sorted out. You’ll actually see them by either typing man git-cmd or git cmd --help.

The winner

So which is it going to be? Well, despite Bazaar’s Python bonus and Git’s initial weirdness, I’ve gone with Git for now. On the server side I’m keeping my Subversion repositories, at least for now. Because at some university machines or on Windows I only have a Subversion client (I suppose I could compile Git myself, not that the uni sysadmins like seeing such a thing). Also, I’ve set up my Subversion repository access via HTTP/HTTPS. That means I can view my repositories with a simple web browser or download the HEAD with wget if I don’t happen to have a Subversion client at hand at all. Finally, keeping Subversion around gives me the possibility to change my mind again and go for something else.

When I bought my 2nd gen MacBook Pro 15″ in late 2006, it was the top of the range with a Core2 Duo (Merom) processor clocked a 2.33 GHz, 2 GiB RAM and a 160 GB harddisk. Two years later it may look a bit shabby compared to the 5th gen Unibody MacBooks, but its inner values suggest that it can still take them on in a benchmark. Surely it wouldn’t be better but I suspect it would still put up a good fight, as a draw between the 4th gen and 5th gen MacBook Pros in the GeekBench results suggests.[1]

One thing that makes a huge difference in the real world is RAM which is why I decided to spend about € 50.- on two 2 GiB DDR2 bars. I now have 4 GiB installed, though my chipset can only address 3 GiB (which I knew beforehand). I can report that this has made the machine a bit smoother when running many apps in parallel (which in my case is, uh, always). It’s not exactly warp drive, though.

If you want warp drive you’ll have to change, well, the drive! Most of the time when you’re waiting for your computer to do something (open an application, find a file, etc.), it’s not because it’s lacking processing power. It’s because it has to read files that are randomly scattered all over the harddisk. Harddisks are terrible at random access. They’re a bit like good old lazy V8s: huge capacity, but reluctant to change pace (and incidentally, not great in terms of power consumption).

Fortunately, there are alternatives to harddisks called Solid State Drives (SSDs) which are supposed to be much better at random access. But while most of the affordable ones merely provide impulse power, two models actually seem[2] to deserve to be called warp drives: the Samsung SSD and Intel’s X25-M.[3] These two are pretty much neck to neck in most of the benchmark, and since the Intel has the same price per capacity ratio but a bit more capacity (80 GB vs 64 GB) than the Samsung, I chose to buy the X25-M. Due to the strong demand for this device, it took me a while to secure one for a reasonable amount of money and so it finally arrived earlier this week.

Unboxing Intel X25-M (3)

Unboxing Intel X25-M (3)

Now, 80 GB isn’t much these days and certainly a step down from my 160 gigabytes of V8 muscle. But I wanted the best of both worlds, speed and capacity, so I decided to ditch the optical drive and trade it in for some harddisk space. After all, software isn’t distributed on CDs or DVDs anymore (except for proprietary operating systems, perhaps), nor are music and movies. Should I ever require an optical drive (e.g. to watch a rented movie), chances are good I’m at home where I can use my external USB/Firewire thingy.

On the 5th gen MacBooks, replacing the optical drive with a harddrive is, at least in theory, trivial because the optical drive is SATA as well. On previous MacBooks, the optical drive has a PATA connector so you’ll need a small controller that translates from ATA to SATA. In either case you’ll want to fit the harddrive into a cage that has the same dimensions and mounting points as an optical drive.

MCE Technologies offers a solution for this called OptiBay, custom tailored for the MacBook or MacBook Pro. If you purchase the harddrive cage by itself, it’s $129. Add $43 to that for international shipping with FedEx. A cheaper solution comes from newmodeus: an HDD cage that’s intended to take the place of a removable optical drive that some laptops have. It’s a mere $42. Shipping with regular US postal service costs just $8 and it only took a few days to get to Germany. The only minor inconvenience was that unlike UPS or FedEx, the regular postal service doesn’t do the customs stuff for you, so I had to go to the local customs office and pick it up. Normally I would have to have paid German V.A.T. on it, but since this is a business expense, I didn’t.

Unboxing the newmodeus HDD optical bay (3)

Unboxing the newmodeus HDD optical bay (3)

Unfortunately, the MacBook Pro doesn’t have a regular size optical drive. It’s thinner which means the cage won’t fit as is. I had to “adapt” it therefore with some cutting tools (a fine metal saw or a sharp side cutter will do, use sandpaper to smoothen the edges). I also removed the top lid and the front cover since those are unnecessary in the MacBook Pro. With these adaptions, the cage fit rather nicely into the empty space that the optical drive had left.

Fitting the HDD cage to the size of the MBP optical drive

Fitting the HDD cage to the size of the MBP optical drive

All this means I now have a fast SSD drive for the operating system, apps, personal data, etc. and my old big harddisk for large files such as my MP3 collection and movies (for which random access isn’t as crucial anyway). But has it worked?

Oh yes. The system is biblically fast. Even while I was copying all my data files over from the old harddrive to the SSD, every single application still opened in an instant. OpenOffice is up and running within 2 seconds. System upgrades now take longer to download than to perform. When automatic login is enabled, the system boots from power off to a fully functioning UI in less than 10 seconds… I could go on.

Admittedly there are a few disadvantages. The “adapted” cage isn’t the best soundproof location to install a harddisk. The CD/DVD slot right in front of the mounting position doesn’t help either. So the noise has slightly gone up, but it’s hardly noticeable, really. I also have no idea whether the motion sensor will put the harddrive to sleep in case the MacBook Pro falls (don’t care about that much, though). And then there’s power consumption. I haven’t done any tests yet, but I have the feeling it’s a bit worse than what it was before. It’s hard to tell because I failed to do a proper test before the operation. One thing I’d quite like to find out is whether the OS X can put the harddisk to sleep once in a while. It only has my MP3 collection and other large files, so it’s quite possible to completely avoid using the harddisk when on the road.

All these are minor issues, really. If you want to speed up your machine, forget everything else. Just get an SSD. And not just any, get one of the warp drives. The really good news is, however, you don’t have to compromise on space. If you’re like me and don’t need your optical drive much, you can have your cake and eat it, too. Warp drive and good old V8 muscle.

Both drives installed

Both drives installed

P.S.: If you’d like to attempt this at home, don’t worry, it’s not difficult. Fitting the cage to the right size was the hardest part, but if you’re willing to spend a bit more money, you can avoid that altogether by buying the OptiBay. You need a few good tools (Torx T6, Philips PH00 and PH0 screwdrivers, pair of tweezers). Then simply follow the excellent instructions on the iFixIt website.

[1] I know that the 4th gen machines have a newer generation processor, but its clock-speed is only marginally faster. And yes, they have a slightly faster chipset and graphics card, but how much of a difference is that going to make. As the benchmark shows, the factor 1.5 speed up of the frontside bus (667 to 1033 MHz) has nearly remained without effect.

[2] Judging from the various test reports I’ve read on different SSD models.

[3] These are MLC models and therefore affordable (which is the criterion here). Certainly there are faster SLC models, but they’re much less affordable.

I ♥ US keyboards

December 6, 2008

Think about it. If you’re a developer and involved in Open Source projects (or alternatively, if you’re employed by an international organization), most of your natural language communication will be in English and, crucially, everything else will be, well, code or something similar to code (XML, LaTeX, Mathematica, etc.). As it happens, code needs characters such as @, {}, [], \, |, which are easily accessible on a US keyboard but not on localized keyboards because they trade the special characters for umlauts and other language-specific keys.

For instance, on a typical European keyboard, nearly all of those special characters that you need in code have to be typed using the right Alt key (which is called Alt Gr for “alternate graphics”) plus some other key. That not only means you’ll have to leave the home row and resort to finger origami, it also slows you down because you’ll no doubt have to look at the keyboard while typing. (It’s much worse on the Mac because Apple refuses to label the keys with the special character that they’d produce, therefore making people having to guess where things are).

So I ask you, if you’re a developer, what are you going to need more often? Umlauts or those characters that predominate in code? Why make typing the characters you need all the time more difficult than typing the characters you only need occasionally? Don’t worry, I’m not suggesting a radically new keyboard layout. But wouldn’t it make more sense to turn the tables on those umlauts? In other words, what I am suggesting is a hacker-friendly US keyboard, simply enhanced with language-specific characters accessible with Alt Gr.

The German Linux Magazine suggested this several years ago already and showed how to do it for German umlauts on X11. Ever since I’ve been on the Mac, I’ve been using a similar custom keyboard layout (I call it “U.S. German”). It’s a U.S. layout, but it allows me to type umlauts with Option+vowel, ß with Option+s and € with Option+e. I created it with Alex Eulenberg’s excellent keyboard layout generator, starting out with the U.S. layout and blank Option key mappings, and then adding my own mappings like so:

   Oa $00e4  :: auml
  OSa $00c4  :: Auml
   Oo $00f6  :: ouml
  OSo $00d6  :: Ouml
   Ou $00fc  :: uuml
  OSu $00dc  :: Uuml
   Os $00df  :: szlig
   On $00f1  :: ntilde
  OSn $00d1  :: Ntilde
   Oe $20ac  :: euro

After having saved the resulting file properly in OS X’s library path and logging out, I was able to chose the keyboard layout from System Preferences and have used nothing but ever since. Here’s a screenshot of the Keyboard Viewer app when pressing the Option key:

usgerman-keylayout

When I originally posted this on my old blog, I got an email from somebody at Aachen University’s Comp. Sci. department who had the same idea, but put a bit more effort in it and produced an even nicer version. So instead of providing my own layout here, I suggest you use the one form their website. Alternatively you could use the generator and come up with your own. I also hear good things about Ukulele, a visual keyboard layout editor. I have no idea about Windows, but I’d be thrilled to hear if people have managed to come up with something for it as well.

Whatever you do, don’t be a slave to your localized keyboard just because you need the occasional umlaut, accent or slashed vowel. There are better ways for us developers. I ♥ US keyboards.


P.S.: One should note that US keyboards are even physically different (the Return key has different proportions and it lacks a key that European keyboards have). Since you couldn’t buy a Mac in Europe with a physical US layout until recently, my laptop has a physical European layout (which sucks). The physical US layout makes much more sense on a Mac since you can jump through applications with Command+Tab and through windows of a particular application with Command+`, which on a physical US layout is right above the Tab key.

Google now officially supports the CalDAV interface for iCal. Thanks to a little tool they provide, it’s a breeze to set it up. However, it does have two rather inconvenient disadvantages:

  • So far I’ve been using iCal locally to manage my various calendars. If I wanted to use Google Calendar from now on (through iCal, of course), then I’d first like to import all my data into Google Calendar. It’s not obvious to me whether this is now possible out of the box and if so, how.
  • A quick test revealed that working with the calendars is possible offline. iCal will synchronize with the online calendar once you’re connected again. That’s nice and certainly the Right Thing ™ to do here. The Google-based calendars are also synchronized to my iPhone. Again, this is what I’d expect. However, the iPhone won’t be able to add or modify an event in any of the Google-based calendars. Since I add appointments and other things right away to my iPhone’s calendar when I’m on the road, I wouldn’t want to give up this feature just to have my calendar available through the Google web interface.

While experimenting with this, I grew rather fond of the idea of being able to access my calendar through the Google web interface and synchronize it with multiple machines. So I started investigating alternative solutions that wouldn’t cripple the iPhone’s possibilities:

  • Spanning Sync and NuevaSync are over-the-air synchronization services. While NuevaSync seems to be a general service for any platform, Spanning Sync is specifically designed to work with Macs. Both services have one thing in common, though, and that’s their means of synchronization. Your PIM data will go to their servers where it’ll be processed and what have you, and then be synchronized with, say, Google Calendar or the other devices you want it to synchronize. Given that I already have to trust one company with my personal data (Google), I’d rather not involve yet another one, thankyouverymuch. Oh, and the service isn’t free, obviously.
  • GCALDaemon is an open-source command line tool written in Java, thus cross-platform, that allows you to synchronize your Google Calendars with local ones, for instance ones that you have on the file system or ones that it publishes via HTTP, so that your CalDAV client can the subscribe to it. Setting it all up is quite straight-forward, thanks to its graphical configuration editor. However, after half an hour of trying to synchronize a calendar between iCal and Google Calendar and finally giving up, I came across several reports saying that it doens’t work with Leopard’s iCal (because the newer iCals save calendar data in a different manner, apparently) or that it only works if you use some Perl hackery. I think NOT.
  • gSync is mentioned positively in several blogs, but apparently it’s no longer being developed. The main site is down (hence no link). Avoid.
  • BusySync is a little piece of software that installs a new pane in the OS X System Preferences. It allows you to either subscribe Google-based calendars locally or publish a local calendar to Google. The difference to CalDAV is that to iCal, the calendars will always look like a local one. That means the iPhone will work just as before. I’ve tried this now and it works like a charm. (BusySync will also do more, for instance publish your calendar over LAN, but I don’t need that).

BusySync isn’t free, it’s $25 per computer. Right now I’m using a demo version, but I’m considering purchasing it. I only need it on one machine, the one that I use to synchronize my iPhone with. The other machines that I might want to access my calendar from could simply subscribe to the calendar via CalDAV.

One thing that BusySync doesn’t seem to do is synchronizing contacts from Address Book. If it did that, I’d be getting out the checkbook right now…

I admittedly have little sympathy for those who lose weeks or months worth of work because their hard drive fails or their laptop is stolen. The way I look at it is if you can’t manage to make backups of such valuable work, you deserve the data loss. It’s not like backups are complicated to do these days. Simply copy your precious files to a USB stick or SD card periodically. Flash storage is insanely cheap these days, hardly ever breaks and can be placed in a safe location because it’s so small. It’s also available in adequate sizes nowadays, even though that’s probably not necessary. After all, your typical PhD thesis won’t occupy gigabytes.

All this is jolly good, it just has one fault: you. Sure, if you’re anything like my dad (i.e. disciplined), it will work for you perfectly. But let’s face it, you’re not. And you’ll always remember to make a backup just as you’re getting ready to head out the door, trying to catch a train or plane (where you may lose your laptop, hence the necessity of a backup). A humane computer system would take care of backups for you. Indeed, that’s what solutions like Apple’s Time Machine are about. Coupled with a network-based storage (Time Capsule), backup is an absolute no-brainer. My mom has periodic backups and she doesn’t even know it. I think solutions like this should be part of every computer system. In fact, I wish there was some way to create incentives for software developers to not make their lives easier but the lives of their users.

For instance, here’s another idea that I think should be built-in: version control. Whenever I start a new project, be it a software project, a book, a business in software training or currently my Diploma thesis, the first thing I do is set up a subversion repository (I could use something else, but subversion is what I’m most familiar with). I even keep all the presentations I ever gave at software conferences in version control. That way I not only have instant off-shore backup of my work (because the repository is on a separate server), I also have all the benefits of version control. Not that I need concurrency or merging because I work by myself. What I’m talking about is, for instance, the ability to revert your working copy to a working state when you’ve tinkered with something and broken it. And even if you hadn’t broken it and I decided to keep the modifications, you can still see what those were later in the process. That’s particularly useful when you’re modifying stuff created by somebody else.

You might argue that the casual user won’t need such a feature, but I disagree. One of the biggest improvements text editors have over the typewriter is the fact that you can work with the text before it’s set in stone (in other words, printed on paper). This gives you flexibility and makes you worry less about gettin git right the first time. Why doesn’t that notion extend to a larger time scale? Many applications nowadays have an undo feature. Why doesn’t that feature work two months after the fact? And please don’t quote me some implementation detail… disk space is cheap! I’d rather trade in some UI glitz for a feature like this.

Just don’t make me think about this stuff, please.

I’m quite happy with my MacBook Pro 15″ (Core2 Duo, late 2006 model), but recently the left cooling fan started to make grinding noises at medium RPMs. It’s fine and quiet at 2000 RPMs and it makes the normal noise when it’s maxed out, but in between you think you’re operating a coffee mill and not a laptop. I figured the fan’s ball-bearing is shot and I better have it replaced before it grinds to a complete halt. However, Apple’s 1 year warranty is long over, as is the extended warranty (2 years) that dealers have to give for all electronic items in Germany. So this might turn out to be more expensive than expected. In addition to that, I’d have to take the laptop to the local Apple dealer and repair shop which takes time, not to mention that I’d be without my main computer for two or so days. All that just for a simple operation that I could just as well do myself.

(Also, the local Apple dealer in Dresden isn’t exactly my favourite shop anymore. They used to know my name when I walked in the store, now their sales staff almost makes you apologize for disturbing their coffee break. Typical German service for ya, but certainly nothing I want from a shop selling premium computers. Way to scare off a previously regular customer! But I digress.)

Having decided to do the operation myself, I wondered where I could get the spare part from. That’s when I found out about Apfelklinik, an operation by one Michael Kliehm, selling brand spanking new spare parts for Macs (with warranty!). I wrote him an email ordering the part, wired the money via PayPal and two days later I had a new fan for my MBP. I installed it this afternoon with the help from the excellent iFixit website which has detailed instructions and photos for pretty much all repairable and exchangeable parts. Here’s a (rather crappy) shot I took with my camera phone half way through the operation:

Fan repair

DRM and the iTunes Store

November 1, 2008

Inspired by a great talk by Larry Lessig about how Copyright Law strangles creativity, here’s a little rant about DRM and the iTunes store that turns into a Happy End. (Feel free to skip the rant if you just want to hear the good news).

Back in the 90s, when I was a teenager, my allowance and the money I made from my paper route went  pretty much into either computer or record stores. In the latter I would buy CDs. CDs were great. You could copy them onto a tape, thereby creating one of those legendary mixed tapes that allowed you to endure long car journeys or a day at the beach. Later, you could read audio CDs onto your computer and make your own remixes of your favourite songs or do other silly stuff. Maybe even just listen to the music when lugging around one of those earlier portables. I actually had one of those and the invention of MP3 meant I could bring my favourite CDs on vacation with me by packing just one CD! Life was good.

You could argue that life’s even better today. Now I can buy music off the internet, it’s just a click away. Yes, in a way I enjoyed hanging out in record stores with my mates. But things have changed and we now have fanstatic offerings like last.fm. Feed it with enough information about your musical taste (by “scrobbling” songs while you listen to them) and it’ll happily tell you about artists similar to the ones you already like. It’s like the 2.0 version of your mate that used to take you to record stores. And he’s already made me spend a pretty penny on music. No regrets, though.

Not everything about all this is as great as it sounds. If you’re just a bit IT-literate, then you know what the Digital Millenium Copyrights Act (DMCA) and other similar admendments to copyright law in EU countries have done to the way music has to be consumed these days. It has effectively criminalized circumventing mechanisms that protect music from being copied. With most music available from the iTunes store being protected with a Digital Rights Management (DRM) mechanism, this means no more mixed tapes, no more funky and embarrassing remixes. Yes, I know, you can still make mixed tapes because they’re analogue. What I’m talking about is the equivalent of the mixed tape for the 21st century: a mixed MP3 CD that my car stereo can play or a dirt-cheap MP3 player that I can take jogging with me without risking getting my iPhone/iPod wet and broken. None of those devices can play encrypted AAC files such as the ones you get from the iTunes store and I don’t see why they should have to. The tape deck in my dad’s car stereo didn’t have to either, right?

To give Apple credit, they do allow the mixed tape use case, albeit in a 20th century fashion: you can write DRM’ed tracks to an audio CD a limited number of times. So apparently it’s alright to build DRM circumventing mechanisms into the software if it’s inconvenient enough for the user. Indeed, the EU has encouraged such “voluntary measures” on behalf of the industry in Directive 2001/29/EC (EU equivalent of the DMCA), paragraph (52). So we’re at the hands of what the industry allows to do with the stuff we’ve bought.

Or are we? In Germany at least, circumventing DRM is legal if it’s for personal use (cf. §108b(1) UrhG). Reading Section 1201 which the DMCA added to U.S. Copyright Law, it seems the situation in the U.S. isn’t as favourable, but then again, I’m not a layer. The best news really is, however, that despite the industry’s best efforts, we might not have to live with DRM for much longer. For a while now, there’s music on iTunes that isn’t DRM-protected. MySpace, Amazon and various other people are selling unprotected MP3s as well. So why don’t you buy there, you might ask. Well, for one thing, I actually couldn’t find out how to buy stuff from MySpace. And Amazon.com will only sell MP3 downloads to people located in the U.S. But nevertheless, there’s no denying that DRM hasn’t had the success that the lobbyists had hoped for. I just hope that Apple will see the light and remove it from the iTunes Store altogether.

In the mean time, if you’re in a country that allows the circumvention of DRM mechanisms for private use, you might enjoy Requiem (version 1.8.1 for iTunes 8). It strips the DRM encryption from AAC files, allowing you to convert them to MP3 and thereby use the music you’ve bought in mixed tape scenarious. Unfortunately, due to legal difficulties in the U.S., Requiem currently can’t be obtained from the author’s website. However, both binaries and source code are available from peer-to-peer networks.