Tuesday, June 23, 2009

Patch-oriented development made sane with git-svn

One of the drawbacks to working on Cassandra is that unlike every other OSS project I have ever worked on, we are using a patch-oriented development process rather than post-commit review. It's really quite painfully slow. Somehow this became sort of the default for ASF projects, but there is precedent for switching to post-commit review eventually.

In the meantime, there is git-svn.

(The ASF does have a git mirror set up, but I'm going to ignore that because (a) its reliability has been questionable and (b) sticking with git-svn hopefully makes this more useful for non-ASF projects.)

Disclaimer: I am not a git expert, and probably some of this will make you cringe if you are. Still, I hope it will be useful for some others fumbling their way towards enlightenment. As background, I suggest the git crash course for svn users. Just the parts up to the Remote section.


  1. git-svn init https://svn.apache.org/repos/asf/cassandra/trunk cassandra
Once that's done the only git-svn commands you need to know about are dcommit to push the changes in the current git branch back to svn, and rebase, to pull changes from svn and re-apply your uncommitted patches on top of that (basically exactly like svn up).

Creating new code:

  1. git checkout -b [ticket number]
  2. [edit stuff, maybe get add or git rm new or obsolete files]
  3. git commit -a -m 'commit'
  4. repeat 2-3 as necessary
  5. git-jira-attacher [revision] (usually some variant of HEAD^^^)
[after review]
  1. git log (just to make sure I'm about to commit what I think I'm about to commit)
  2. git-svn dcommit
  3. git checkout master
  4. git-svn rebase -l (this will put the changes you just committed into master)
  5. git branch -d [ticket number]
When I'm reviewing code it looks similar:
  1. git checkout -b [ticket number]
  2. wget patches and git-apply, or jira-apply CASSANDRA-[ticket-number]
  3. review in gitk/qgit and/or IDE (the intellij git plugin is quite decent)
  4. commit .. branch -d as above
The last operation is "see who I need to bug to get reviews moving." This is just a list of the branches I haven't merged into master and deleted yet:
  1. git branch
Git-svn takes a lot of the pain out of the ASF's patch-and-jira workflow. In particular, you can easily break changes for a ticket up into multiple patches that are easily reviewed, and the latency of waiting for patch review doesn't kill your throughput so badly since you can just leave that branch alone and start a new one for your next piece of functionality. And of course you get git commit --amend and git rebase -i for massaging patches during the review process.

One fairly common complication is if you finish a ticket A, then start on ticket B (that depends on A) while waiting for A to be reviewed. So you checkout -b from your branch A rather than master and build some patches on that. As sometimes happens, the reviewer finds something you need to improve in your patch set for A, so you make those changes. Now you need to rebase your patches to B on top of the changes you made to A. The best way to do this is to branch A to B-2, then git cherry-pick from B and resolve conflicts as necessary.

Final note: I often like to create lots of small commits as I am exploring a solution and combine them into larger units with git rebase -i for patch submission. (It's easier to combine small patches, than pull apart large ones.) So my early commit messages are often terse and need editing. You can change commit messages with edit mode in rebase, then using commit --amend and rebase --continue, but that is tedious. I complained about this to my friend Zach Wily and he made this git amend-message command (place in [alias] in your .gitconfig):

   amend-message = "!bash -c ' \
       c=$0; \
       if [ $c == \"bash\" ]; then echo \"Usage: git amend-message <commit>\"; exit 1; fi; \
       saved_head=$(git rev-parse HEAD); \
       commit=$(git rev-parse $c); \
       commits=$(git log --reverse --pretty=format:%H $commit..HEAD); \
       echo \"Rewinding to $commit...\"; \
       git reset --hard $commit; \
       git commit --amend; \
       for X in $commits; do \
           echo \"Applying $X...\"; \
           git cherry-pick $X >> /dev/null; \
           if [ $? -ne 0 ]; then \
               echo \"  apply failed (is this a merge?), rolling back all changes\"; \
               git reset --hard $saved_head; \
               echo \" ** AMEND-MESSAGE FAILED, sorry\"; \
               exit 1; \
           fi; \
       done; \
       echo \"Done\"'"
(Zach would like the record to show that he knows this is pretty hacky. "For instance, it won't work if one of the commits after the one you're changing is a merge, since cherry-pick can't handle those." But it's quite useful, all the same.)

For what it's worth, the rest of my aliases are

 st = status
 ci = commit
 co = checkout
 br = branch
 cp = cherry-pick