Thursday, April 30, 2009

Automatic project structure inference

David MacIver has an interesting blog entry up about determining logical project structure via commit logs. I was very interested because one of Cassandra's oldest issues is creating categories for our JIRA instance. (I've never been a big fan of JIRA, but you work with the tools you have. Or the ones the ASF inflicts on you, in this case.)

The desire to add extra work to issue reporting for a young project like Cassandra strikes me as slightly misguided in the first place. I have what may be an excessive aversion to overengineering, and I like to see a very clear benefit before adding complexity to anything, even an issue tracker. Still, I was curious to see what David's clustering algorithm made of things. And after pestering him to show me how to run his code I figure I owe it to him to show my results.

In general it did a pretty good job, particularly with the mid-sized groups of files. The large groups are just noise; the small groups, well, it's not exactly a revelation that Filter and FilterTest go together. I'd be tempted to play with it more but with only about two months and 250 commits in the apache repo there's not really all that much data there. (Cassandra's first two years were in an internal Facebook repository.) Working with data that exists as a side effect of natural activity is fascinating.

Saturday, April 11, 2009

The best PyCon talk you didn't see

There were a lot of good talks at PyCon but I humbly submit that the best one you haven't seen yet is Robert Brewer's talk on DejaVu. Robert describes how his Geniusql layer disassembles and parses python bytecode to let his ORM turn python lambdas into SQL. Microsoft got a lot of press for doing something similar for .NET with LINQ, but Bob was there first.
  box = store.new_sandbox()
print [c.Title for c in box.recall(
  Comic, lambda c: 'Hob' in c.Title or c.Views > 0)]
This is cool as hell. The Geniusql part start about 15 minutes in.

Tuesday, April 07, 2009

Credit where credit is due

I'm starting to conclude that git just doesn't fit my brain. Several months in, I'm still confused when things don't work the way they "should." My co-worker says I should start a wiki for weird-ass things to do with git: "You keep coming up with use cases that would never occur to me." But, I have to give the git community credit: I've never gone in to #git on freenode and gotten less than fantastic help. Even with git-svn.