Saturday, November 19, 2011

On applying for jobs

A friend asks,
If [I see a] job I could do, even though I don't meet the stated requirements, should I apply anyway?
Short answer: yes.

Longer answer: companies are all over the map here, although in general the less layers of bureaucracy there are between the team that the candidate will work with and the hiring process, the more likely the list of requirements is to be actual requirements.

How can you tell?

HR paper pushers like to think in terms of checklists because that lets them go through hundreds of resumes without any real understanding of the position, so they write ads like this one -- lots of really specific "5+ years of X," not much about what the position actually involves.

But if it's the team lead himself writing the description, which you will see at smaller companies, then you get much more about what the position involves and less checklist items, because the lead is comfortable determining competence based on skill instead of pattern matching. For a software development position, I don't care if you have a degree in CS if you can code. (Open-source contributions are a better signal for ability and passion than a degree, anyway.) My team has people with no degree, to people with PhDs.

Even when dealing with large companies, you have to factor in that people are terrible at distinguishing "want" from "need." A lot of "requirements" are really "nice-to-haves." It can be tough to tell the difference, but the better idea you have of what the job actually involves, the better you can tell which are hard requirements.

For instance: without knowing anything else about a position, my guess is that "native French speaker" really would be a hard requirement. That's not the sort of thing people tend to put down on a whim. But even then, there are shades of grey. For instance, if I were looking for a job and found a "distributed databases developer position, must know Java, be familiar with open source and be a native French speaker" then I might see if they'd give me a pass on the last part because I'm a really good fit for the rest -- and I know they're unlikely to find a lot of candidates with an exact match.

In short, you have little to lose by trying, but don't just shotgun out resumes; include a cover letter that highlights the best matches from your experience to what they are looking for. Follow up with the hiring manager if possible to ask (a) "I sent in my resume a few days ago, and I wanted to see where you are in the hiring process for this position," and if they reply that they got it but you're not a good fit, ask (b) what specifically they were looking for, so you can flesh out your intuition that much more for next time.

Good luck!

Tuesday, January 04, 2011

Apache Cassandra: 2010 in review

In 2010, Apache Cassandra increased its momentum as the leading scalable database. Here is a summary of the notable activity in three areas: code, community and controversy. As always, comments are welcome.


2010 started with the release of Cassandra 0.5, followed by 0.6 and graduation from the ASF incubator a few months later. Seven more stable releases of 0.6 proceeded, adding many features to improve operations in response to feedback from production users.

0.7 adds highly anticipated features like column value indexes, live schema updates, more efficient cluster expansion, and more control over replication, but didn't quite make it into 2010, with rc4 released on new year's 2011.

We also committed the distributed counters patchset, begun at Digg and enhanced by Twitter for their real-time analytics product. Notable as the most-involved feature discussion to date, distributed counters started with a vector clock approach, but switched to a new design by Kelvin Kakugawa after we realized vector clocks were a dead end for anything but the trivial case of monotonic-increments-by-one.

One of the biggest trends was increasing activity around Cassandra as well as in the core database itself. 2010 saw Hadoop map/reduce integration, as well as Pig support and a patch for Hive.

We also saw Lucandra, which implements a Cassandra back end for Lucene and is used in several high volume production sites, grow up into Solandra, embedding Solr and Cassandra in the same JVM for even more performance.


Cassandra hit its stride in 2010, starting with graduation from the ASF incubator in April. 2010 saw 1025 tickets resolved, nearly twice as many compared to 2009 (565).

Like many Apache projects, Cassandra has a relatively small set of committers, but a much larger group of contributors. In 2010 Cassandra passed over 100 people who have contributed at least one patch. Release manager Eric Evans put together a great way to visual this with a Code Swarm video of Cassandra development.

I started Riptano with Matt Pfeil in April to provide professional products and services around Cassandra. In October, we announced funding from Lightspeed and Sequoia. From May to December, we conducted eleven Cassandra training events in eight months, and twice that many private classes on-site with customers.

Riptano is now up to 25 employees, with offices in the San Francisco bay area, Austin, and New York, and engineers working remotely in San Antonio, France, and Belarus.

In August, Riptano and Rackspace organized a very successful inaugural Cassandra Summit, with about 200 attendees (videos available), followed by almost a full track at ApacheCon in November. Cassandra was also represented at many other conferences on multiple subjects, for several languages, and continents.


Cassandra got a lot of negative publicity when Kevin Rose blamed Cassandra for Digg v4's teething problems. However, there was no deluge of bug reports coming out of Digg's Cassandra team, and Digg engineers Arin Sarkissian and Chris Goffinet (now working on Cassandra for Twitter) got on Quora to refute the idea that Cassandra was at fault:

The whole "Cassandra to blame" thing is 100% a result of folks clinging on to the NoSQL vs SQL thing. It's a red herring.

The new version of Digg has a whole new architecture with a bunch of technologies involved. Problem is, over the last few months or so the only technological change we mentioned (blogged about etc) was Cassandra. That made it pretty easy for folks to cling on to it as the "problem".

Meanwhile, Digg competitor Reddit has continued migrating to Cassandra, crediting it with enabling their 3x traffic growth in 2010.

More importantly, 2010 saw dozens of new Cassandra deployments, including a new contender for the largest-cluster crown when Digital Reasoning announced a 400-node cluster for the US government.

We look forward to another great year in 2011!