On Searching

May 30, 2008

At trainings, sprints and other occasions I get to pair with a lot of bright people. What always fascinates me most is how they use tools to be efficient. What really saddens me, though, is when good developers waste time with a repetitive task. To me these observations were particularly interesting because I always thought I’m not using my tools very efficiently.Take emacs, for example. I use it, but I hardly use as many features and shortcuts as most others do. For instance, I don’t use auto-completion (I like to think I can type fast enough) nor do I know the keyboard shortcut for everything (“what was it again, Elbow + left-eyebrow RETURN toenail?”). So yes, you’ll have to make trade-offs between using tools and doing things yourself. Nevertheless, I think it’s a good idea to encourage people to use some on-board tools. One of the best tools that a computer can offer you, I think, is searching. So what I’m trying to tell you in this article is:

Let the computer find things for you!

My mom and dad use the computer a lot. And because they grew up with traditional filing strategies such as cabinets, binders and folders, they’ll recreate the same structure on the file system. Naturally, when they want to find something, they’ll open folder after folder until they’ve found the document they were looking for.

When I, on the other hand, want to find a file I open Spotlight and type some text from that file. It doesn’t even take a second and it finds emails just as much as events from my calendar. And it always finds the stuff. If it doesn’t find stuff, it’s not there. So it’s much more reliable than I ever would be. And what’s nicer, I don’t have to be so anal about filing everything away in the right folder. I’ve stopped filing the a good percentage of my emails into folders because Apple Mail’s search feature is so freaking fast (especially since the Leopard upgrade).

So searching is a great way to deal with unstructured data. Structure, in fact, becomes less important with good searching capabilities. Where was that PDF article on Supersymmetry that I read the other day? Was it sent to me via email? Did I download it from somewhere? Fact is, I couldn’t care less as long as I can find it right now. Not having to first go through my emails, then my browser’s download folder means I save valuable time when I need it again in a hurry.

Thus searching can make your work quite efficiently, especially when you’re a software developer, I think. Here are three examples of where searching can make a programmer’s work quite a bit more efficient:

  • Use grep when looking for a particular piece of code (e.g. when looking for the definition of a method, grep for def foo). Do I need to explain this? I hope not. I have no idea what people do on Windows, I suppose they use something built into their editor. Either way, don't try to manually search a large codebase, it will cost you hours.
  • Use your editor's search feature to look for a particular piece of code in a file. Sounds trivial, I know, but do a little self-experiment and watch what you do when you already know exactly what you're looking up in a particular file, say, the Foo class. What I do, is this: I don't bother taking a look around. I hit Ctrl+S in my emacs and start typing class Foo. Heck, I even use the search feature to navigate within the same file. Nothing can position the cursor as exactly as a quick search. Except the mouse, perhaps. So ok, using the search feature for navigation only works if you are a quick typist, and you'll also want an editor that does progressive searching (searching as you type) and none of that "Search Next" stuff. But you know what, when you have a web browser that can do this as well, like say Firefox, finding important stuff in web pages can be just as quick. Just start typing the text and it'll be highlighted instantly.
  • Use bash's ability to search the command history when repetitively using commands on the command line. Don't bother hitting the Up key for a 100 times, trying to find that hugely complicated command you typed in two hours ago. Hit Ctrl+R, start typing parts of the command and it'll appear as you type. Didn't know this feature yet? Ain't it cool? Will it save you time? Maybe. Will it save you much aggrevation? Quite probably.

I'm sure this list can be extended. My advice is, as soon as you find yourself repetitively looking for something, ask yourself if and how it could be sped up with an automated search.

There's also a nice corollary: If you want to make your own applications user-friendly, fit them with efficient and easy-to-use searches. I know that's highly non-trivial, but it's well worth it. Plone's live search for instance is awesome, and I wish many other large websites had it as well.

Lastly, here's a little anecdote that inspired me to write about this searching topic in the first place. Probably the best search business out there is Google. Their web, image, etc. searches are impressive, but I think one of their greatest search toys is Google Maps because it understands unstructured information as entered by humans and puts some structure in it.

Anyway, the anecdote:

The other day I needed the address of Sixt, a rental car company in Germany, in a city called Kassel. I could've opened the browser, gone to the Yellow Pages website, entered "sixt" into the form field for the business type or name and "kassel" for the location, and with some luck I would have gotten the address. With Google Maps, I open the browser to "maps.google.de" and just enter "sixt kassel" into the one and only text field. And voila, I get the address and telephone number of the Sixt office in Kassel (and as a byproduct, really, a map of the place).

It's the same with Google Maps Directions. Most other systems will make you enter the start and end location using several input fields each. You'll have to type in the street into one input field, the postal code in yet another and the town in a third. And if you're crossing countries, by God, don't forget to select the right one from a huge drop-down list. Not so with Google Maps. Just two input boxes, one each for start and end location. When you type in "Rome" to "London", it knows you're not travelling from Rome, Indiana to London, Ohio. What's more, you can just copy'n'paste an address from anywhere into Google Maps, it will understand it.

Coming back to my original point of letting the computer do the searching for you: Having Google Maps find you a business address may not be that impressive (apart from the way it just "groks" your input). What's really impressive is that it'll find the things you're looking for close to some location. Let's say, for instance, I was stuck in the beautiful town of Torgau, Saxony, Germany and needed a rental car. "sixt torgau" is what I'd type into Google Maps, and voila, it finds the closest Sixt stations which aren't in Torgau at all. The best thing is, I don't even have to work out myself which one's the closest. Just use Google Maps Directions from "Torgau" to "sixt torgau" and I get list of the closest Sixt stations sorted by distance.

Of course, the same thing works with airports, Starbucks cafes, and just about anything that's on Google Maps. Personally I find the fact that I don't have to go to various different websites to look for, say, all airports in Saxony, then figure out their distances to Torgau just to see which one's the closest, incredibly cool. In fact, I think I'd like to see quite a few more DWIM (do what I mean) applications out there, and just for the fun of it, I think I'm going to write one myself.

When you work with subversion a lot, the first thing you’ll notice is that you have to type the repository URL a lot (when switching, merging, tagging, etc.). This can, of course, be avoided by defining an environment variable. For instance, my .profile contains the following line:

$ export z=svn+ssh://philikon@svn.zope.org/repos/main

That way I can easily check out any Zope project with

$ svn co $z/PROJ/trunk PROJ

That saves a lot of typing already (and frankly, any Zope developer who doesn't have an environment variable like this is... well, it's their own damn fault anyway) . But other operations, especially merging, are still a p.i.t.a. because you have to find out the revision number that created the branch and then type it all in. You also have to remember what your branch was called.

Now enter eazysvn. It's a set of tools that I wrote (back then it was just ezmerge.py) which were refined later by Marius Gedminas and myself. Here's how it works:

Let's say I'm experimenting with something in, say, Grok. So I'm messing within a Grok trunk checkout and now I want to check it in. But since it's some goofy experiment, it can't go into the trunk. So I must now create a branch, switch my working copy to that and only then I can check things in. With eazysvn, it's really simple:

$ ezswitch -c philikon-goofy-experiment
$ svn ci

As you might've guessed ezswitch switches a working copy to an already existing branch. But if that branch doesn't exist yet, I can tell it to create the branch using the -c parameter.

So now I can happily check things into my experimental branch from that working. Quite a few weeks later I might remember that I had a branch like this. But what was the name of that branch? ezmerge can tell me:

$ ezmerge -l
0.10
0.11
0.12
neanderthal-startupspeed
philikon-goofy-experiments
snowsprint-viewlets2

Great. So let's say I'm now in a trunk checkout of Grok and I'd like to merge the philikon-goofy-experiments branch. Normally, I'd have to figure out the revision numbers and the URL. Not with ezmerge:

$ ezmerge philikon-goofy-experiments

This will figure out all the revision numbers automatically. It will even, before merging, produce a log output of the branch so you get an idea what you did on it and can compose the check-in message for the merger from it.

ezmerge is also useful in another situation, namely when merging bugfixes to release branches. Let's say I fixed a bug on the Grok 0.12 maintenance branch in r12345 and the fix now needs to be propagated to the trunk. No problem:

$ ezmerge 12345 0.12

(assuming that the Grok 0.12 maintenance branch is called0.12). This will figure out all this revision number arithmetic that svn normally needs you to do. What a time saver!

Update: I should note that eazysvn isn't specific to Zope's subversion repository (even though the z in the name may suggest this). ezmerge and ezswitch in fact work with any repository that adheres to the common trunk/branches/tags convention and it simply inspects the path of the working copy to figure out the paths of branches that you want to merge or switch to.

New blog!

May 29, 2008

With my old blog at z3lab.org down, I decided to start a new one at WordPress. Enjoy!

Update: z3lab.org seems to be up again now (thanks Nuxeo!). I’m still sticking with WordPress, mostly because it’s much easier to use (both for me and for people with comments).

Follow

Get every new post delivered to your Inbox.