Wednesday, November 10, 2010

Local Version Control

My company uses a centralized version control system. It works reasonably well, but of course has all the downsides of centralized VCS. So I wanted to explain how I use a local version control system to supplement even if there isn't a real "bridge" like git-svn.

Why

Before I start let me sum up why you might want to.

  1. It's not uncommon to work on task A only to have it interrupted by super critical task B. Local VCS makes it easy to put aside the work in progress for task A, go back to the state of the system before task A started, get task B done, and then pick up task A again.
  2. Centralized version control almost never manages an entire configuration. In every development environment I've ever worked there are some local settings that you can't check in because they are for you only. Examples include editor preferences, personal credentials for a service, and machine specific configurations. A local VCS can hold your personal configuration just as well as shared configuration.
  3. It can be hard to get the manager of a centralized version control system to create branches for your experiments. And checking in half-assed code to a central VCS's main line is usually frowned upon. A local VCS means you can take short steps forward, roll back, and branch in different directions at a whim all while maintaining as much history as you want.
  4. You can do all the above even while offline.
  5. If your central VCS uses a lock/edit/unlock (ick!) model then a local VCS makes it possible to do an edit/merge model locally and only lock the moment you're ready to commit.
  6. If your local VCS is a distributed VCS (DVCS) then you can share your ongoing work with immediate teammates without polluting the main code line with something experimental or half-baked.

Which One?

Which local VCS to use? Other than the last use case it doesn't matter, much. Just beware that Subversion isn't so good at merging branches multiple times. Also, if your central VCS likes to play with read/write permissions then it's a bad idea to use a local VCS that also wants to do so. More generally the local and centrallized VCS can't keep metadata in the same place, so Subversion can't be used both locally and remotely. I'd heartily recommend git, Mercurial, Darcs, or Bazaar whatever you're most comfortable with.

The Basic Model

It's very simple to have local VCS for the most common workflow. The basic idea is to have a local repository that manages the same directories managed by your centralized VCS. Then a typical work session might look like

  1. Synch with central VCS
  2. Immediately commit to local VCS with a comment like "synch" so you know it wasn't your work
  3. Do some work
  4. Commit to local VCS
  5. Do some more work
  6. Commit to local VCS
  7. Finish your work
  8. Commit to local VCS
  9. Synch from central VCS, merging as necessary
  10. Make sure your stuff still works
  11. Commit to local VCS with a comment like "merge"
  12. Commit to centralized VCS

That's as simple as it can be. Other than the occasional commit to local VCS, it's exactly what you do now. The process works because the way your local VCS and distributed VCS track what has changed is different. The centralized VCS doesn't know anything about all the intermediate commits to local VCS. It just thinks you've made a bunch of changes and committed them. Similarly, the local VCS is oblivious to the central VCS. At step 2 it thinks you've just checked in some massive change.

There are more advanced uses of a local VCS but the above model seems to cover about 95% of what I use mine for.

Cloning/Branching

A slightly more advanced model when using DVCS is to clone the local "master" repository and work in the clone. This model can be very useful when working on something very experimental which is likely to take a long time with frequent interruptions for higher priority tasks. Upstream commits and synchs become a bit more complicated, but "shelving" is automatic: to jump on a quick high priority task just go work in the master repository or make another clone from the master.

Alternatively, if your local VCS supports another branching model then the same idea can be done with a branch.

Sharing

If you're using a DVCS then work-in-progress can be shared across a small team within the larger organization. However, it's probably a good idea to have one person (or one repository) responsible for synching from the centralized VCS to avoid spurious merge conflicts since the local VCS has no way to know that different people synching from the central VCS are pulling files with a common history.

The Fine Print

If you use an IDE to control your central VCS you may have to stick to command line for your local VCS or, if you prefer, use something like one of the Tortoise* style plugins for a graphical file manager. I've never experimented with getting an IDE to try to understand that one set of directories is managed by two version control systems, but I can't imagine that it would be pretty.

Some centralized VCSs really, really prefer to know what you're working on. They may have a lock/edit/unlock cycle (ick!) or they may just use knowledge of your "working" files as an optimization (Perforce does this). If so, cloning can be a problem because the checkout bit won't be tracked in the clone. If you modify files in the clone then push back to the master the centralized VCS will disagree with you about what state the file is in. The same problem goes for a shared DVCS model. It's not an insurmountable problem, but one to be aware of. Basically before committing to centralized VCS all the changed files will need to be "checked out" and merged with your work.

Conclusion

Stick to the basic model at first and you'll see it's very easy to get most of the benefits of local VCS. You'll quickly realize that the local VCS is an incredible safety net even with just the simple model. Later you can branch out *ahem* into more advanced uses.

Given the simplicity and safety of having a local VCS it's insane to work with only a centralized VCS. And who knows, maybe your little experiment will be the catalyst for moving your company to a DVCS.

3 comments:

  1. I've been using IntelliJ IDEA as my IDE (integrated development environment) for a while now, and it has an excellent version of this process baked into it. IntelliJ uses a local version control to track all changes. You can review the changes to a given file (or to everything or to a subtree) and see changes you entered in IntelliJ, changes that were picked up when you updated your checkout, and changes made when you edited using some other program.

    This turns out to be a fantastic tool -- the ability to freely go see older versions and revert at will is a handy way to fix all kinds of problems. All applications should adopt this!

    ReplyDelete
  2. Thanks for this info. I seem to be stuck using the IDE revision tracking because svn has the nasty propensity to completely take over the source tree with its .svn directories that make it very inconvenient to use a private VCS on top.

    ReplyDelete
  3.  In my company, we have central cvs and my local git gets along with it just fine. The key was simply to add 'CVS/' to reposity's root .gitignore. Now cvs versioning stuff is ignored in all subdirectories, recursively. So I presume that '.svn/' would work in your case. Though I have never used svn.

    ReplyDelete