Don’t make me think … about backups and version control

November 25, 2008

I admittedly have little sympathy for those who lose weeks or months worth of work because their hard drive fails or their laptop is stolen. The way I look at it is if you can’t manage to make backups of such valuable work, you deserve the data loss. It’s not like backups are complicated to do these days. Simply copy your precious files to a USB stick or SD card periodically. Flash storage is insanely cheap these days, hardly ever breaks and can be placed in a safe location because it’s so small. It’s also available in adequate sizes nowadays, even though that’s probably not necessary. After all, your typical PhD thesis won’t occupy gigabytes.

All this is jolly good, it just has one fault: you. Sure, if you’re anything like my dad (i.e. disciplined), it will work for you perfectly. But let’s face it, you’re not. And you’ll always remember to make a backup just as you’re getting ready to head out the door, trying to catch a train or plane (where you may lose your laptop, hence the necessity of a backup). A humane computer system would take care of backups for you. Indeed, that’s what solutions like Apple’s Time Machine are about. Coupled with a network-based storage (Time Capsule), backup is an absolute no-brainer. My mom has periodic backups and she doesn’t even know it. I think solutions like this should be part of every computer system. In fact, I wish there was some way to create incentives for software developers to not make their lives easier but the lives of their users.

For instance, here’s another idea that I think should be built-in: version control. Whenever I start a new project, be it a software project, a book, a business in software training or currently my Diploma thesis, the first thing I do is set up a subversion repository (I could use something else, but subversion is what I’m most familiar with). I even keep all the presentations I ever gave at software conferences in version control. That way I not only have instant off-shore backup of my work (because the repository is on a separate server), I also have all the benefits of version control. Not that I need concurrency or merging because I work by myself. What I’m talking about is, for instance, the ability to revert your working copy to a working state when you’ve tinkered with something and broken it. And even if you hadn’t broken it and I decided to keep the modifications, you can still see what those were later in the process. That’s particularly useful when you’re modifying stuff created by somebody else.

You might argue that the casual user won’t need such a feature, but I disagree. One of the biggest improvements text editors have over the typewriter is the fact that you can work with the text before it’s set in stone (in other words, printed on paper). This gives you flexibility and makes you worry less about gettin git right the first time. Why doesn’t that notion extend to a larger time scale? Many applications nowadays have an undo feature. Why doesn’t that feature work two months after the fact? And please don’t quote me some implementation detail… disk space is cheap! I’d rather trade in some UI glitz for a feature like this.

Just don’t make me think about this stuff, please.

12 Responses to “Don’t make me think … about backups and version control”

  1. One word (well, three ;): ZFS.

  2. philikon Says:

    Limi: Sure, ZFS allows snapshots which might make what I want elegant and efficient in implementation. That still means some system has to implement it and provide a useful UI. More than that, though, it needs to be automatic and idiot-proof.

  3. Not that I’m a big Microsoft fan, but I recall some feature (Volume Shadowing Service?) that creates a diff based filesystem towards this effect.

  4. Brandon Corfman Says:

    I love Google Docs for its automatic versioning of documents. I’m always revisiting and editing my writing, and it’s such a lifesaver to have all the old versions available, since I don’t always make the best decisions on what to cut the first time. 🙂

  5. Malthe Says:

    One buzzword: DropBox.

  6. philikon Says:

    Malthe: I’ve heard about DropBox before and it certainly seems like a slick tool. But it essentially has the same problem that Google Docs (mentioned by Brandon) has: it’s online. Call me old-fashioned, but I’d like to be in control of my data.

    That said, doing stuff online has an advantage. It’s an off-shore backup by definition. It’s accessible anywhere and collaboration is easy. But I don’t need all that. I just want undelete and revert to work on my local files.

  7. Lee Joramo Says:

    For server backups I have been using rdiff-backup. There is a cool FUSE adapter ArchFS that lets you navigate the backups via file system mounts, unfortunately I haven’t been able to get ArchFS to compile on my Debian based servers.

    For system backups I have been evaluating CrashPlan, which lets me continuously backup my notebook to a remote server (my Mac Mini or CrashPlan’s own server). I think that I will be deploying CrashPlan’s “business” server products for my sysadmin clients in the coming year.

    Most of my systems are Mac’s, and I use SuperDuper to make bootable disk images, so that I can restore a system in a matter of seconds.

    And then there is SVN for programming projects and other text based files.

    Ouch, that is a lot of different tools to take care of a group of related problems:

    Continuous Backup
    Off-site Backup
    Bootable Disk Copies
    Restore Old Files
    Version Control

    This should be much easier.

  8. Philip Says:

    Time Machine –
    pros: searchable, easy, works as long as my external harddisk is connected, bootable backup
    cons: mac only, not OpenSource, obviously only one user so it’s no use for development

  9. Christophe Combelles Says:

    DVCS are a very efficient and simple way to get both version control and backups. Just ‘hg init’ in a directory, and then you can push everywhere, share the whole repository in several machines, work and commit offline and merge easily. I really like the fact that the workspace and the repository are kept in the same location.

  10. In fact I’d love to be at home like at the office: working in a server environment where every data is store on a server.

    With the Internet available everywhere thanks to the Wi-Fi and 3G+ connections, you could remotely access to your data from every in the world…

    That will mean reasonable prices for say 1 To per family with amonthly subscription and laptops reaching easily the network.

    Such a solution may somehow seem strange but it will free us from constant/regular backups, painful restorations and virus attacks.

    This is my dream.

  11. At SpiderOak we’ve been working for about 2 years on a solution to this set of problems, mostly in Python.

    Our software is a cross platform, multi computer data backup, unifying, and sharing service. It preserves deleted files and historical versions until you intentionally remove them, deduplicates across your full data set, and takes a zero-knowledge approach to encryption.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: