Choosing a distributed revision control system

December 30, 2008

What’s nice about Subversion is that it’s easy enough for most people to wrap their head around it and therefore it’s supported well. Admittedly it can make branching a pain. Actually, that isn’t quite true. It’s a bit like speed which itself doesn’t kill you, just the sudden impact does. Likewise he problem is not with branching, it’s the merging that can make you pull your hair out. But it’s gotten better at that over the past releases.

Another issue with Subversion is the central repository. I however think the central repository idea fits many projects or even individuals in need of revision control very well. And if you want to mirror a repository for bandwidth or high-availability reasons, well that’s possible too since version 1.4 or so. I know, these mirrors have to be read-only, otherwise it’ll easily get messy. If you want to be able to commit to some local mirror and push back your changes, you should consider SVK. It mirrors foreign repositories locally, lets you create local branches and merge them back. SVK has a few problems of its own, but I’m not going to get into them here. Point is, if your project works best so that there’s a central repository, Subversion is a sensible choice. Thanks to solutions like SVK, people will also be able to work offline (I’ve used it this way) or be able to follow a project and add their modifications without leaving their cave.

Distribution is not for everyone

Since lately people have been trying to convince us that central repositories are not the way of the future, and neither is Subversion because it’s not suitable at all in a distributed environment. But in all honesty, after many years of contributing to various Open Source projects I haven’t actually had the need for such an environment. I get Linus’ points about how the Linux Kernel is developed, how he receives patches from his lieutenants who in turn receive them from somebody else. Surely all that needs a distributed system. But I’m sorry, the projects I’ve worked on just don’t have the man power to have people do nothing else but review patches sent to them, merge them with their private branches, push them along to other people, etc. In our central repositories we’re happy with having a few knowledgeable people watch the commit list (we call them the “checkin police” in Zope) and make sure that code and patches committed as well as the log messages match our quality standards. For everything else there’s buildbot.

All that’s not to say that distributed revision control systems aren’t nice because they are. I do understand why their users are smug. But I just won’t have their arrogance. Subversion has served many of us well over the years and now all of a sudden we’re idiots if we still like it?

That said, all distributed systems can do what Subversion can do (except partial checkouts of the repository apparently) so they seem worth a look. After all you get more features and no drawbacks, right?

Sorting out the contenders

There seem to be three serious options when it comes to distributed version control that all have about the same feature set: Bazaar, Mercurial and Git. Actually, that’s not quite correct. If you’re like me and have to or want to work with several Subversion repositories, Mercurial isn’t an option. As nice as Mercurial may seem (though a bit weird in its understanding of branches), you’ll have to realize that only Bazaar and Git have decent Subversion plugins that allow you to pull and push to a Subversion repository.

Both Bazaar and Git are installed easily on OS X if you have MacPorts. Just beware that if you want to build Git with SVN support, you should install it as follows:

sudo port install git +svn

For the following tests I took one of my private Subversion repositories (the smallest one in which I keep all the files related to my work at the university) and tried interacting with it from Bazaar and Git.

Bazaar

The first thing you’ll hear about Bazaar is that while the documentation is pretty good, it’s slow. And boy they aren’t lying. I haven’t done any measurements, but it felt even slower than Subversion on operations like printing the status of the working copy.

Like with Mercurial, Bazaar’s command set is quite easy to grasp for people who’ve been brainwashed by Subversion. In other words, people like me. However, I can’t help but think that’s because Bazaar isn’t actually too far away from Subversion, conceptually speaking. Sure, it has local repositories and all that, but in essence it seems to be geared towards a central repository when it comes to sharing your work. Not that that’s a bad thing, as I tried to point out above. I just somehow expected more. For instance, let’s say you have a checkout of something. This checkout can only be bound to one branch in a remote repository at a time. That means you can’t push some work to several repositories at the same time.

Like SVK, Bazaar has the concept of a working copy directly associated with a remote branch and working copies that represent local branches. If you have one of the former, the unbind/bind feature is quite neat. It tells Bazaar to temporarily stop sending every commit to the remote repository (e.g. while you’re hacking away on the train). Once you’re back with network connectivity, you rebind to the remote branch and can push your changes. Unfortunately, Bazaar wants you to push all these changes as one revision (to Subversion) even if you made several commits when offline. I’d rather have it reflect the individual commits.

Another downside in the communication with a Subversion repository is that it leaves turds in the repository, that is special directory properties that it uses to track which revision it has synced. In this respect it’s similar to SVK.

Git

Having read tutorials and guides a la “Git for SVN refugees”, I must get the impression that Git is surrounded by a cloud of fanboyism. Fanboyism per se is tolerable, but as I said above, I don’t like when it’s mixed with arrogance. I know that Subversion isn’t the bee’s knees, that’s why I’m reading this tutorial. You don’t have to tell how stupid I’ve been using Subversion all along and not helping Linus come up with Git.

That said, once you look past the fanboyism, you’ll realize Git is actually quite well composed. The commands are a bit weird at first but so far each one has done exactly what I expected it to do. Its concept of remote and local branches is absolutely easy to understand and since Linus designed it to do kernel development, you can easily manage a gazillion local and remote branches, do merges between them, etc. I’ll admit that it feels a bit weird in the beginning, but you’ll soon appreciate the niftiness.

Something that definitely takes getting used to is the way it represents branches. A checkout and a repository are inseparably the same thing, therefore switching between branches happens within the same checkout. I’m not yet entirely sure yet what to think of that, all I know is that you might easily forget which branch you’re currently in and do something to a branch that you meant to do in another one. That’s not a big problem with Git, though, because you can easily roll back commits. What is annoying, however, is that you can’t switch branches or rebase your changes on top of the latest changes from the repository you’re tracking (e.g. SVN) while having local modifications. I tend to keep local modifications in my working copy almost forever, for instance when I have a canonical version of a configuration file in the repository and I change it locally for a test installation. Git has ways around that annoyance, too, for instance I could use git stash to hide the local modifications temporarily, or I could make a local branch in which I can check in the modifications but never push them back to the tracked repository, just pull the latest changes.

Git’s Subversion integration is superb. There’s an excellent tutorial for people who’ve deserted from Subversion/SVK to Git. It also mentions how to interact with a Subversion repository. In fact, generally you read that Git’s supposedly not as well documented as, say, Bazaar. I can’t come to that conclusion. I’ve rarely needed the online user guide, the man pages are quite well sorted out. You’ll actually see them by either typing man git-cmd or git cmd --help.

The winner

So which is it going to be? Well, despite Bazaar’s Python bonus and Git’s initial weirdness, I’ve gone with Git for now. On the server side I’m keeping my Subversion repositories, at least for now. Because at some university machines or on Windows I only have a Subversion client (I suppose I could compile Git myself, not that the uni sysadmins like seeing such a thing). Also, I’ve set up my Subversion repository access via HTTP/HTTPS. That means I can view my repositories with a simple web browser or download the HEAD with wget if I don’t happen to have a Subversion client at hand at all. Finally, keeping Subversion around gives me the possibility to change my mind again and go for something else.

17 Responses to “Choosing a distributed revision control system”

  1. ajung Says:

    A every well-written overview. Thanks.

    The problem is not choosing one RCS but having to deal with all of them in some way. Some years ago there was only CVS and we knew all its bugs and workaround. Now we have SVN, Git, Bzr and Mercurial and as an OSS developer you have to deal with all of them in some way. SVN has been adopted by most OSS projects. Now some customers are using Bzr und Mercurial and you have to deal with bugs and features. Especially Mercurial and Bzr support different development models and ways to merge branches..it is hard staying on the top with each system and using them without reading documentation over and over again and checking back with local release manager about how to use a RCS in their current context.

  2. Wichert Akkerman Says:

    Subversion mirrors no longer have to be read-only these days: they can use webdav proxying internally to push your commit to the master without you even noticing.


  3. I agree that Subversion does centralized revision control very well: it has a clean model and there are good tools for interacting with it.

    As for hg svn interaction, you should try out the hgsubversion extension for Mercurial: http://www.bitbucket.org/durin42/hgsubversion/

    I have tested it recently and it works quite nicely by allowing you to push/pull from Subversion repositories. It is still under heavy development, though, but give it a go.


  4. Hi. I’m the author of a new extension for Mercurial called hgsubversion (my “website” is a link to the project page) which is rapidly growing. I and several others I know of already have completely replaced using the svn command line client with Mercurial and hgsubversion. Pushing and pulling have been completely implemented for quite a while, and pull is typically quite a bit faster than with git-svn (I’ve not used bzr’s svn integration so I can’t compare there).

    Note that the code is still somewhat prerelease, but I (and others) use it in a production environment every day with no issues, and have been for months.

  5. John Says:

    I find Mecurial to be an great tool for one-off files. Like, I want to protect /etc/aliases, or /etc/importantstartupscript. Coming from Subversion, I require nearly no mental context-switch — the commands are the same, just without having to checkout a pool. Presto, the file’s source-controlled.

    I don’t understand complaints about Subversion branching at all. I’ll agree that it’s possible to create a changeset, over enough files, with enough simultaneously existing other branches, to create a problem. But I suspect folks’ work styles are also partly responsible for the problems they enounter. I’ve never seen branching pain achieve a high level in any of my work. Subversion merging works great, and GUI tools exist to assist conflict resolutions.

    Maybe I unwittingly adjust my work style to accomodate the branching support in Subversion. Then again, every tool has limitations — e.g., whether they know it or not, Git users adjust to Git’s limitations too. (Whatever they are.)

    Anyway…whatever works, works. I’m happy with Subversion because it works, it’s been successful in large non-trivial projects, I know it, and it’s solid.

  6. M F Says:

    “However, I can’t help but think that’s because Bazaar isn’t actually too far away from Subversion, conceptually speaking. […] For instance, let’s say you have a checkout of something. This checkout can only be bound to one branch in a remote repository at a time.”

    That’s because the whole point of the ‘checkout’ command IS for acting in a centralized manner, like SVN. For other workflows, you’d use an independent ‘branch’ or some other setup.

  7. David Says:

    Regarding not knowing what git branch you’re on, I saw this: http://log.damog.net/2008/12/two-git-tips/

  8. Brendan Says:

    Have you tried hgsubversion (http://www.selenic.com/mercurial/wiki/index.cgi/HgSubversion)? It’s pretty new, but it’s gotten the stamp of approval from Ben Sussman (one of the subversion authors): http://blog.red-bean.com/sussman/?p=116

  9. paddy3118 Says:

    http://www.selenic.com/mercurial/wiki/index.cgi/WorkingWithSubversion has some info on the extent of MercurialSubversion interoperability.

    – Paddy.

  10. kemayo Says:

    I’m quite fond of git as well, though I don’t care at all about the distributed aspect of it. Might be nice if I was running a massive open source project and was really devoted to the idea of making forks easy, but I’m not.

    I saw one approach to the “what git branch am I on?” issue, which relies on you using the bash shell for your interactions with version control.

    There’s also tortoisegit for Windows, which at least has your current branch mentioned on its commit dialog.

  11. philikon Says:

    Thanks everybody for pointing me to hgsubversion. I did come across it while researching, but I was put off by the various warning flags a la “under heavy development, do not use in production”. I promise I’ll check it out though.

    Oh, and thanks for making my choice even harder!🙂

  12. Andrew Bennetts Says:

    bzr-svn doesn’t require that you leave “droppings” in the SVN repository; I’m pretty sure it can work the same way that git’s SVN support works if that’s what you want, see “bzr help dpush”. (Btw, those droppings are what track the individual commits usually.)

    And as other commenters have said, Bazaar doesn’t have to be used the same way you use SVN. If you use its checkouts feature, then you get the familiar centralized workflow. But you can always make decentralized branches at any time.

  13. philikon Says:

    M F and Andrew, thanks for pointing that out. Apparently I’ve not grasped that aspect of Bazaar correctly. That said, I wonder why Bazaar treats the two differently under the hood. My analysis thus remains, git (and git-svn) seems to be better composed.

    (Btw, my initial good impression of Bazaar was also battered by the fact that it spewed tracebacks at me when trying out the ‘bzr unbind/bind’ feature. Forgot to mention that in the article.)

  14. Balazs Ree Says:

    I’m with bazaar recently, with good experience. True it’s still slower (to start with, it’s python), but there is plenty of network speed optimizations going on. Altogether speed is acceptable for me at this moment.

    Re bzr-svn: bazaar has additional metadata to store in svn, this is why it uses svn properties for this. In fact bazaar maps its storage model 100% to svn with the help of this additional metadata. Without this bzr could not use svn for storage, as svn has no built-in notion of the related changesets in different branches, which is a central feature of all modern revision control systems.

    Oh… and there are plenty of advantages that the python-based plugin architecture of bzr offers. As a nice example: there is a plugin that allows you to share your private branches from your laptop and make them discoverable on the local network. Could be awesome on a sprint. Comparing bzr with git and others solely based on performance is like comparing FF with IE but at the same time completely ignore the wide range of plugins you can use with FF.


  15. Thanks for the hint about svk, I’m using it now for working locally on a repository that I don’t have write access too, and still get versioning. Much nicer than just having a readon-only checkout.

  16. philikon Says:

    Lennart: SVK is nice, but for tracking a read-only SVN repository while making local changes Git (or even Bazaar) is much better IMHO. I’ve ditched all my SVK depots in favour of Git repositories.


  17. […] a comment » Last Christmas I investigated some distributed revision control systems (so that I could keep on working normally despite the spotty internet connection over the […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: