Choosing a distributed revision control system
December 30, 2008
What’s nice about Subversion is that it’s easy enough for most people to wrap their head around it and therefore it’s supported well. Admittedly it can make branching a pain. Actually, that isn’t quite true. It’s a bit like speed which itself doesn’t kill you, just the sudden impact does. Likewise he problem is not with branching, it’s the merging that can make you pull your hair out. But it’s gotten better at that over the past releases.
Another issue with Subversion is the central repository. I however think the central repository idea fits many projects or even individuals in need of revision control very well. And if you want to mirror a repository for bandwidth or high-availability reasons, well that’s possible too since version 1.4 or so. I know, these mirrors have to be read-only, otherwise it’ll easily get messy. If you want to be able to commit to some local mirror and push back your changes, you should consider SVK. It mirrors foreign repositories locally, lets you create local branches and merge them back. SVK has a few problems of its own, but I’m not going to get into them here. Point is, if your project works best so that there’s a central repository, Subversion is a sensible choice. Thanks to solutions like SVK, people will also be able to work offline (I’ve used it this way) or be able to follow a project and add their modifications without leaving their cave.
Distribution is not for everyone
Since lately people have been trying to convince us that central repositories are not the way of the future, and neither is Subversion because it’s not suitable at all in a distributed environment. But in all honesty, after many years of contributing to various Open Source projects I haven’t actually had the need for such an environment. I get Linus’ points about how the Linux Kernel is developed, how he receives patches from his lieutenants who in turn receive them from somebody else. Surely all that needs a distributed system. But I’m sorry, the projects I’ve worked on just don’t have the man power to have people do nothing else but review patches sent to them, merge them with their private branches, push them along to other people, etc. In our central repositories we’re happy with having a few knowledgeable people watch the commit list (we call them the “checkin police” in Zope) and make sure that code and patches committed as well as the log messages match our quality standards. For everything else there’s buildbot.
All that’s not to say that distributed revision control systems aren’t nice because they are. I do understand why their users are smug. But I just won’t have their arrogance. Subversion has served many of us well over the years and now all of a sudden we’re idiots if we still like it?
That said, all distributed systems can do what Subversion can do (except partial checkouts of the repository apparently) so they seem worth a look. After all you get more features and no drawbacks, right?
Sorting out the contenders
There seem to be three serious options when it comes to distributed version control that all have about the same feature set: Bazaar, Mercurial and Git. Actually, that’s not quite correct. If you’re like me and have to or want to work with several Subversion repositories, Mercurial isn’t an option. As nice as Mercurial may seem (though a bit weird in its understanding of branches), you’ll have to realize that only Bazaar and Git have decent Subversion plugins that allow you to pull and push to a Subversion repository.
Both Bazaar and Git are installed easily on OS X if you have MacPorts. Just beware that if you want to build Git with SVN support, you should install it as follows:
sudo port install git +svn
For the following tests I took one of my private Subversion repositories (the smallest one in which I keep all the files related to my work at the university) and tried interacting with it from Bazaar and Git.
The first thing you’ll hear about Bazaar is that while the documentation is pretty good, it’s slow. And boy they aren’t lying. I haven’t done any measurements, but it felt even slower than Subversion on operations like printing the status of the working copy.
Like with Mercurial, Bazaar’s command set is quite easy to grasp for people who’ve been brainwashed by Subversion. In other words, people like me. However, I can’t help but think that’s because Bazaar isn’t actually too far away from Subversion, conceptually speaking. Sure, it has local repositories and all that, but in essence it seems to be geared towards a central repository when it comes to sharing your work. Not that that’s a bad thing, as I tried to point out above. I just somehow expected more. For instance, let’s say you have a checkout of something. This checkout can only be bound to one branch in a remote repository at a time. That means you can’t push some work to several repositories at the same time.
Like SVK, Bazaar has the concept of a working copy directly associated with a remote branch and working copies that represent local branches. If you have one of the former, the unbind/bind feature is quite neat. It tells Bazaar to temporarily stop sending every commit to the remote repository (e.g. while you’re hacking away on the train). Once you’re back with network connectivity, you rebind to the remote branch and can push your changes. Unfortunately, Bazaar wants you to push all these changes as one revision (to Subversion) even if you made several commits when offline. I’d rather have it reflect the individual commits.
Another downside in the communication with a Subversion repository is that it leaves turds in the repository, that is special directory properties that it uses to track which revision it has synced. In this respect it’s similar to SVK.
Having read tutorials and guides a la “Git for SVN refugees”, I must get the impression that Git is surrounded by a cloud of fanboyism. Fanboyism per se is tolerable, but as I said above, I don’t like when it’s mixed with arrogance. I know that Subversion isn’t the bee’s knees, that’s why I’m reading this tutorial. You don’t have to tell how stupid I’ve been using Subversion all along and not helping Linus come up with Git.
That said, once you look past the fanboyism, you’ll realize Git is actually quite well composed. The commands are a bit weird at first but so far each one has done exactly what I expected it to do. Its concept of remote and local branches is absolutely easy to understand and since Linus designed it to do kernel development, you can easily manage a gazillion local and remote branches, do merges between them, etc. I’ll admit that it feels a bit weird in the beginning, but you’ll soon appreciate the niftiness.
Something that definitely takes getting used to is the way it represents branches. A checkout and a repository are inseparably the same thing, therefore switching between branches happens within the same checkout. I’m not yet entirely sure yet what to think of that, all I know is that you might easily forget which branch you’re currently in and do something to a branch that you meant to do in another one. That’s not a big problem with Git, though, because you can easily roll back commits. What is annoying, however, is that you can’t switch branches or rebase your changes on top of the latest changes from the repository you’re tracking (e.g. SVN) while having local modifications. I tend to keep local modifications in my working copy almost forever, for instance when I have a canonical version of a configuration file in the repository and I change it locally for a test installation. Git has ways around that annoyance, too, for instance I could use
git stash to hide the local modifications temporarily, or I could make a local branch in which I can check in the modifications but never push them back to the tracked repository, just pull the latest changes.
Git’s Subversion integration is superb. There’s an excellent tutorial for people who’ve deserted from Subversion/SVK to Git. It also mentions how to interact with a Subversion repository. In fact, generally you read that Git’s supposedly not as well documented as, say, Bazaar. I can’t come to that conclusion. I’ve rarely needed the online user guide, the man pages are quite well sorted out. You’ll actually see them by either typing
man git-cmd or
git cmd --help.
So which is it going to be? Well, despite Bazaar’s Python bonus and Git’s initial weirdness, I’ve gone with Git for now. On the server side I’m keeping my Subversion repositories, at least for now. Because at some university machines or on Windows I only have a Subversion client (I suppose I could compile Git myself, not that the uni sysadmins like seeing such a thing). Also, I’ve set up my Subversion repository access via HTTP/HTTPS. That means I can view my repositories with a simple web browser or download the HEAD with wget if I don’t happen to have a Subversion client at hand at all. Finally, keeping Subversion around gives me the possibility to change my mind again and go for something else.