Monday, September 06, 2021

A review of Lambda School from the father of a recent graduate

Background

I’ve been a professional developer for twenty years.  I exposed my son N to programming a couple times while he was growing up --  Scratch when he was around 8, Khan Academy javascript when he was 12.  He learned it easily enough but it didn’t grab him.

But his junior year in high school he had a hole in his schedule and I convinced him to try AP CS to fill it.  And this time, he got hooked.  He started programming for fun in the evenings.  You know how it goes.

Then in March 2020, Covid hit and his high school went virtual.  It was a terrible experience, to the point that instead of going back for more his senior year, he took the last classes he needed to graduate over the summer, and decided to apply to programming boot camps in the fall.  I think the American college system is broken, so I was happy to help evaluate his options for something different.

Evaluating boot camps

N and I came up with three criteria for evaluating boot camps.  If they didn’t meet these three, we weren’t interested:

  1. Income-sharing agreement, or similar.  The incentives for the school to, in effect, take your money and run are very very strong in for-profit education.  ISA means they only get paid if you get a job.  This creates a simple but powerful alignment.
  2. Modern curriculum.  If they’re still teaching Ruby on Rails, we’ll pass.
  3. Pre-work.  If a school will admit anyone who applies with a shotgun-style approach to student success, that’s not a good model.  It’s a much better sign if they have a rigorous set of pre-admission work to demonstrate some level of interest and aptitude.  I spent a little over a year teaching college level CS classes, so I know that for whatever reason programming just doesn’t fit everyone's brain.

Our shortlist was App Academy, Hack Reactor, Rithm, and Lambda School.  App Academy, Rithm, and Hack Reactor hit all three of the above criteria.  Lambda School did not have pre-work as rigorous as the others, but they offered the longest curriculum so we thought that could make up for it.  (However, Lambda changed from a 9 month course to 6 months just before N applied.)

While applying, we found that App Academy had some fine print that they would not do an ISA for students under 20, so we took them off the list. 

N was accepted to Rithm, Hack Reactor, and Lambda.  He decided on Lamba primarily because they have been online-only from the beginning so we thought they probably had an edge over Hack Reactor and Rithm, which had been primarily (HR) or entirely (Rithm) in-person before the pandemic.

Lamba School

Lambda did a pretty competent job preparing N to be an entry level web developer.  Enough HTML, CSS, Node.js, React, and PostgreSQL to be dangerous, plus the basics of Git and Bash.  A good foundation that he can build on.

Lambda School’s curriculum is a series of six month-long units.  After the first unit, most of the coursework was set up to prepare the students to tackle fairly meaty projects done in teams of five or six, divided into presentation layer, front end code (Node), and back end code (SQL).  These divisions are by experience level at Lambda.  So students X, Y, and Z would work on a project, then next month Z would graduate, X and Y would rotate positions, and W would join as the new guy.  Multiply this by two to get the six person teams.

The quality of instruction was overall solid, but if something went wrong like a version incompatibility with node, getting help troubleshooting depended on who you asked.

After the first four units came a month of CS subjects like binary trees and recursion, and then for the final month, they got back into teams for a capstone project.  N’s team revamped a web app for a tiny nonprofit.  It was good experience, he learned a lot about understanding what an existing system did and how to rebuild it while keeping the good parts.

So on balance: while I totally get the standpoint that there is high quality instructional and reference material across the Internet for free or a much lower cost than Lambda School, I think that between the actual instruction, the accountability from a formal curriculum, the project work, and the real-world capstone with the nonprofit, Lambda delivered value for what N is paying.

Post graduation

So the coursework and instruction at Lambda was well done.  Unfortunately, Lambda came up short in several areas of helping N find a job.

  1. Before finishing the coursework, there was little communication on how the job application process would go, what to expect, or how to prioritize your time.  
  2. Resume building was a mixed bag.  They did help N create a plaintext resume, and they coached the students on how to present their prior, non-programming experience, but they did not help create a rich text version for human reviewers.
  3. N doesn’t know how you get picked for one of the sexy new programs like Lambda Fellows, nobody talked about that and nobody he knows was included.
  4. N graduated right in the middle of traditional summer intern season, which Lambda ignored completely.  Seems like a missed opportunity at the very least -- surely most of the Lambda grads would prefer a paid internship to months of applying to developer positions while working another job.
  5. Most importantly: after graduating, Lambda emailed N just five job openings over a period of two months to say, contact them here if you’re interested. That’s it, that’s the extent of their post-graduation job search support. (None of the five replied to N’s application.)

On the positive side, N thinks Lambda had a great program for interview coaching. They did multiple rounds of one-on-one mock interviews to help the students get used to the kinds of questions they could expect in the interview process.

N ended up finding a job through my network, several of whom were willing to interview a new boot camp grad.  (Thank you!)  He just finished his first week working full time as a software developer.

Commentary and educated guesses

Given that N thought (and I think, and the people who interviewed him thought) that Lambda’s actual instruction was good, why are there so many reviews online complaining about it?  I think there are two big factors:

First, Lambda is doing something new and consciously not following traditional instructional design because "that’s how we’ve always done it."  Remember, Lambda was online-only before the pandemic.  That alone means things are going to be different from other schools.  And they’re not trying to help you get a well rounded classical liberal arts education or even necessarily to “learn to learn” -- their goal is to teach you enough practical programming to get a job as an entry level developer.  This means, for instance, that they do a lot more project work in teams than your nearest college CS department would.  I think that’s a good thing.

It also means that nothing is sacred and things can change quickly.  So they changed it from 9 months to 6 (which I believe did not affect already-enrolled students) and eliminated paid team leads from their project work (which did).  If you try new things, some of them aren’t going to work out.  I understand how this would suck as a student, but running a school is expensive, running a school that does something nobody has done before  (successfully, and at scale) is even more expensive, so the faster they can iterate on what works and stop what doesn’t, the better.  I don’t fault Lambda for this.

The other factor is students who didn’t have the necessary background to be successful.  It makes me sad to see Lambda students writing about “flexing” (repeating) a unit for a second time.  I think there’s a high likelihood that they weren’t ready to be admitted.  This is something Lambda could fix by increasing the rigor of their relatively short precourse work.

It’s a balance -- you don’t want to only admit students who are 100% guaranteed to succeed, but on the other hand it’s not really doing people a favor to admit them if they only have a 10% chance.  I’m not sure exactly where the balance is, but it seems likely (based on what I see other bootcamps doing that create high-quality outcomes) that Lambda’s filter should be a bit tighter.  (On the other hand, I see other people criticizing Lambda for making money off the students who are so well prepared that they would be able to get a job programming no matter what they did.  This is definitely not the case for the typical Lambda student, but if people are saying that then maybe that’s an indicator that Lambda has about the right balance after all.)

Based on Lambda’s relative lack of help sourcing job opportunities for N, I also wonder if they’ve scaled too fast, too quickly.  Lambda advertises two things: relevant skills, and help finding a job.  It seems to me that it’s a lot easier to scale the instructional part, than your pipeline of companies who want to hire graduates from a new and relatively unproven school.  This would explain the relative lack of referrals that N saw, and it would also explain why Lambda hasn’t released student success metrics since 1H 2019 over a year ago.  (And for that, I do fault Lambda.)

TLDR

Lambda did a good job with curriculum and instructional design, maybe even a great job.  But their job-search program was significantly weaker, or perhaps it just hasn’t been able to scale to meet an increased volume of admissions.  I am cheering for Lambda and I hope they can fix it.

Sunday, March 18, 2012

Speaking to a technical conference

I just got back from PyCon, and as with all conferences where the talks are delivered by engineers instead of professional speakers, we had a mixed bag. Some talks were great; others made me get my laptop out.

The most important important axiom is: a talk is not just an essay without random access. It's a different medium. Respect its strengths instead of wishing it were something it's not.

Here are some concrete principles that can help:

Don't read your slides

Advice often repeated, too-seldom followed. This is sometimes phrased as "make eye contact with your audience," but I've seen that second version interpreted to mean, "make eye contact while reading your slides, so your head pops up and down like a gopher poking out of its hole." So just don't read your slides, no matter what else you're doing.

Some good presenters go to extremes with this, with just one or two words per slide. This is fine as a stylistic embellishment, but not necessary for a good talk. You don't need to be that minimalistic. Just remember that with every transition, your audience will read the new slide before returning its attention to whatever you are saying. (Watch for this the next talk you attend; you will absolutely catch yourself doing it.)

Other presenters use "builds" to combat this. This can be useful in moderation, but it's more often used as a crutch, especially when presenting a list of related material. Personally, if I have an information-dense topic, like this one from my Strata talk, I'll put the whole list up at once but I'll leave the details off the slide and speak them instead.

I'm also not a fan of "presenter notes" displayed on a secondary monitor. Too often this leads to the gopher effect or to underpracticing, or both.

The one time you do want to explicitly direct attention to your slides is to explain part of it. For example, on this slide I explained that the upper right was an example of DataStax's Opscenter product interfacing with Cassandra over JMX; the upper left was jvisualvm, and so forth. Since it was a large room, I really did say things like, "in the upper right, ..." In a smaller room I like to stand close enough to the screen to just point.

Use visual aids

One of the best uses of builds is to explain a complicated diagram or sequence a piece at a time. This is difficult-to-impossible to do as effectively in prose alone. Sylvain's talk on the Cassandra storage engine at FOSDEM 2012 is a good example. Starting at about 22:00, he explains how Cassandra uses log-structured merge trees to turn random writes into sequential i/o. Compare that with the treatment in the Bigtable paper, or the original LSTM paper. Sylvain's explanation is much more clear by virtue of how it's presented.

I avoid audio or video during my presentations since using it effectively is a skill I don't yet have, but I've seen it done well by others. I can't imagine my favorite PyCon talk being as effective without the recorded demonstration at the end.

Finally, pictures can also be more effective than the spoken word at communicating humor. I'm not sure who came up with this first, but the juxtaposition here is worth well over 1000 words.

Leave them wanting more

Your goal in most public speaking is to get people interested enough to learn more on their own, not to make them experts.

One thing I struggled with early on was, how do you explain code without reading your slides? I realized that the answer was, if you're trying to explain code, you're getting too deep into the weeds. Sometimes I'll use a snippet of code to give the "flavor" of an API, but wall-of-text slides mean you're Doing It Wrong.

Another common mistake is to start your talk with an outline. (Worse: outline "progress reports" during the talk that tell the audience how far along you are.)

A much better way to get the audience engaged is to tell a story: How did you come across the problem you are solving? What makes it challenging? What promising approaches didn't actually work out, and why? This is a classic story arc that will get people interested much more than if you dive into the nuts and bolts of your solution.

Practice

Paul Graham gets this one wrong: while ad-libbing is indeed the polar opposite of reading your slides, it's also sub-optimal. You need practice to get timings right, to try out different phrasings of your thoughts, and to make transitions smooth. Don't fall for the false dichotomy that either you ad-lib or you practice all the spontaneity out; there's a happy medium in between.

Mechanics

Finally (last and least?), a brief note on mechanics. Stand where you can gesture freely and naturally; ideally (in a small room) next to the screen. Don't stand behind a podium. Don't speak sitting down. Pacing a little bit is good.

All these things mean: you need a slide remote. Even if you are right next to your laptop, reaching down to hit the spacebar or arrow key is distracting. But if you are doing it right you are probably not right next to your laptop. The remote included with Macs is unfortunately not enough, since it relies on infrared line-of-sight. If the conference doesn't provide one, borrow one from another presenter. If you speak frequently, it's worth the approximately $40 cost to get your own so you don't have to wrestle with unfamiliar hardware when you go live.

Good luck!

















Thursday, February 02, 2012

Thinkpad 420s review

In the last three years my primary machines have run OS X, Linux, Windows, OS X, and now Windows again, in that order. The observant reader may note, "That's a lot of machines in three years." It is, but I also changed jobs twice in that time frame, so that's part of it. Another part is that I'm a bit rough with laptops; the two mac machines broke badly enough that AppleCare told me they weren't going to help. The Dell and Lenovo machines, though, outlasted my use of them. For this most recent machine, I had several requirements and several nice-to-haves, some of which were in tension. Requirements:
  • Able to drive a 30" external monitor
  • At least 8GB of RAM
  • At least 1440x900 native resolution
Nice to have:
  • Smaller than my 15" macbook pro, which is too large to use comfortably in coach on an airplane
  • Larger screen than my wife's 13.3" mbp
  • A "real" cpu, not the underclocked ones in the Macbook Airs
  • A graphics card that can do justice to Starcraft II
I wasn't picky about my operating system. Linux is by far the best experience for software development, but support for multiple monitors is still dicey, which is bad when you're relying on it to give presentations on unfamiliar projectors. OS X is superficially unix-y but lack of package management means in practice it's not really any better than Windows. Windows is ... Windows, although I'm pretty fond of the new-in-Windows-7 window management keyboard shortcuts. But fundamentally I spend 99% of my time in an IDE, a web browser, hipchat, and IRC, all of which are cross-platform. MSYS gives me about as much of the unix experience on Windows, as I got on OS X. (And ninite gives me more of a package manager than I had on OS X--granted, that isn't saying much.)
I ended up buying a 14" Thinkpad 420s. I think the S stands for "slim," and it is. My 15" mbp looks and weighs like a ton of bricks next to it, even after I swapped out the Thinkpad's dvd drive for the supplementary battery module, which weighs a little more. The Thinkpad's legendary keyboard lives up to its reputation, and I'm a huge fan of the trackpoint living right there on the home row of the keyboard, for when keyboard shortcuts aren't easily available. The cooling is excellent without the fans ever getting loud.
For the most part, I'm extremely happy with the hardware. There are two exceptions:
  • The built-in microphone is terrible. Almost without exception, people have trouble hearing me over Skype. Adding insult to injury, there is no mic input. I thought at first the headphone jack was a phone-style out-plus-in jack, but no. I'll have to get a USB mic.
  • Optimus doesn't work in one important respect: in optimus mode it won't drive my 30" monitor at full resolution; it picks something weird like 2048x1560 instead. Lenovo said they were going to fix this but hasn't, yet. To drive this monitor correctly I have to lock it to discrete graphics in the bios. In discrete mode it gets about 2h 45m battery life even with the CPU downclocked and the display dim. So when I travel, I reboot to integrated graphics.
The main alternative I considered was the Sony Vaio Z. Ironically, I ended up going with the Thinkpad mostly because Vaio reviewers consistently called out how terrible its built-in speakers were... so I ended up with a system with a terrible mic instead.
On the software side, I'm more than happy with Windows, especially after the Steam holiday sale. I hadn't realized how many fantastic indie games are available these days. (Most recently, I highly recommend Bastion.)
The one fly in my soup is that I'd anticipated being able to run OS X in a VM for the sake of Keynote. Neither Google Docs presentations, Open/Libre Office Impress, or Powerpoint are adequate replacements. Unfortunately, the Core Image (?) APIs used by Keynote don't work under virtualization, so for now I'm still using my old mbp to create presentations, and taking them on the road with me as pdf.

Saturday, November 19, 2011

On applying for jobs

A friend asks,
If [I see a] job I could do, even though I don't meet the stated requirements, should I apply anyway?
Short answer: yes.

Longer answer: companies are all over the map here, although in general the less layers of bureaucracy there are between the team that the candidate will work with and the hiring process, the more likely the list of requirements is to be actual requirements.

How can you tell?

HR paper pushers like to think in terms of checklists because that lets them go through hundreds of resumes without any real understanding of the position, so they write ads like this one -- lots of really specific "5+ years of X," not much about what the position actually involves.

But if it's the team lead himself writing the description, which you will see at smaller companies, then you get much more about what the position involves and less checklist items, because the lead is comfortable determining competence based on skill instead of pattern matching. For a software development position, I don't care if you have a degree in CS if you can code. (Open-source contributions are a better signal for ability and passion than a degree, anyway.) My team has people with no degree, to people with PhDs.

Even when dealing with large companies, you have to factor in that people are terrible at distinguishing "want" from "need." A lot of "requirements" are really "nice-to-haves." It can be tough to tell the difference, but the better idea you have of what the job actually involves, the better you can tell which are hard requirements.

For instance: without knowing anything else about a position, my guess is that "native French speaker" really would be a hard requirement. That's not the sort of thing people tend to put down on a whim. But even then, there are shades of grey. For instance, if I were looking for a job and found a "distributed databases developer position, must know Java, be familiar with open source and be a native French speaker" then I might see if they'd give me a pass on the last part because I'm a really good fit for the rest -- and I know they're unlikely to find a lot of candidates with an exact match.

In short, you have little to lose by trying, but don't just shotgun out resumes; include a cover letter that highlights the best matches from your experience to what they are looking for. Follow up with the hiring manager if possible to ask (a) "I sent in my resume a few days ago, and I wanted to see where you are in the hiring process for this position," and if they reply that they got it but you're not a good fit, ask (b) what specifically they were looking for, so you can flesh out your intuition that much more for next time.

Good luck!

Tuesday, January 04, 2011

Apache Cassandra: 2010 in review

In 2010, Apache Cassandra increased its momentum as the leading scalable database. Here is a summary of the notable activity in three areas: code, community and controversy. As always, comments are welcome.

Code

2010 started with the release of Cassandra 0.5, followed by 0.6 and graduation from the ASF incubator a few months later. Seven more stable releases of 0.6 proceeded, adding many features to improve operations in response to feedback from production users.

0.7 adds highly anticipated features like column value indexes, live schema updates, more efficient cluster expansion, and more control over replication, but didn't quite make it into 2010, with rc4 released on new year's 2011.

We also committed the distributed counters patchset, begun at Digg and enhanced by Twitter for their real-time analytics product. Notable as the most-involved feature discussion to date, distributed counters started with a vector clock approach, but switched to a new design by Kelvin Kakugawa after we realized vector clocks were a dead end for anything but the trivial case of monotonic-increments-by-one.

One of the biggest trends was increasing activity around Cassandra as well as in the core database itself. 2010 saw Hadoop map/reduce integration, as well as Pig support and a patch for Hive.

We also saw Lucandra, which implements a Cassandra back end for Lucene and is used in several high volume production sites, grow up into Solandra, embedding Solr and Cassandra in the same JVM for even more performance.

Community

Cassandra hit its stride in 2010, starting with graduation from the ASF incubator in April. 2010 saw 1025 tickets resolved, nearly twice as many compared to 2009 (565).

Like many Apache projects, Cassandra has a relatively small set of committers, but a much larger group of contributors. In 2010 Cassandra passed over 100 people who have contributed at least one patch. Release manager Eric Evans put together a great way to visual this with a Code Swarm video of Cassandra development.

I started Riptano with Matt Pfeil in April to provide professional products and services around Cassandra. In October, we announced funding from Lightspeed and Sequoia. From May to December, we conducted eleven Cassandra training events in eight months, and twice that many private classes on-site with customers.

Riptano is now up to 25 employees, with offices in the San Francisco bay area, Austin, and New York, and engineers working remotely in San Antonio, France, and Belarus.

In August, Riptano and Rackspace organized a very successful inaugural Cassandra Summit, with about 200 attendees (videos available), followed by almost a full track at ApacheCon in November. Cassandra was also represented at many other conferences on multiple subjects, for several languages, and continents.

Controversy

Cassandra got a lot of negative publicity when Kevin Rose blamed Cassandra for Digg v4's teething problems. However, there was no deluge of bug reports coming out of Digg's Cassandra team, and Digg engineers Arin Sarkissian and Chris Goffinet (now working on Cassandra for Twitter) got on Quora to refute the idea that Cassandra was at fault:

The whole "Cassandra to blame" thing is 100% a result of folks clinging on to the NoSQL vs SQL thing. It's a red herring.

The new version of Digg has a whole new architecture with a bunch of technologies involved. Problem is, over the last few months or so the only technological change we mentioned (blogged about etc) was Cassandra. That made it pretty easy for folks to cling on to it as the "problem".

Meanwhile, Digg competitor Reddit has continued migrating to Cassandra, crediting it with enabling their 3x traffic growth in 2010.

More importantly, 2010 saw dozens of new Cassandra deployments, including a new contender for the largest-cluster crown when Digital Reasoning announced a 400-node cluster for the US government.

We look forward to another great year in 2011!

Monday, April 26, 2010

And now for something completely different

A month ago I left Rackspace to start Riptano, a Cassandra support and services company.

I was in the unusal position of being a technical person looking for a business-savvy co-founder. For whatever reason, the converse seems a lot more common. Maybe technical people tend to sterotype softer skills as being easy.

But despite some examples to the contrary (notably for me, Josh Coates at Mozy), I found that starting a company is too hard for just one person. Unfortunately, all of my fairly slim portfolio of business guys I'd like to co-found with were unavailable. So progress was slow, until Matt Pfeil heard that I was leaving Rackspace and drove to San Antonio from Austin to talk me out of it. Not only was he not successful in talking me out of leaving, but he ended up co-founding Riptano. And here we are, with a Riptano mini-faq.

Isn't Cassandra mostly just a web 2.0 thing for ex-mysql shops?

Although most of the early adopters fit this stereotype, we're seeing interest from a lot of Oracle users and a lot of industries. Unlike many "NoSQL" databases, Cassandra doesn't drop durability (the D in ACID), and besides scalability, enterprises are very interested in our support for multiple data centers and Hadoop analytics.

Are you going to fork Cassandra?

No. Although the ASF license allows doing basically anything with the code, including creating proprietary forks, we think the track record of this strategy in the open source database world is mixed at best.

We might create a (still open-source) Cassandra distribution similar to Cloudera's Distribution for Hadoop, but the mainline Cassandra development is responsive enough that there isn't as much need for a third party to do this as there is with Hadoop.

What does Rackspace think?

Rackspace has been the primary driver of Cassandra development recently, employing (until I left) the three most active committers on the project. For the same reasons Rackspace supported Cassandra to begin with, Rackspace is excited to see Riptano help take the Cassandra ecosystem to the next level. Rackspace has invested in Riptano and has been completely supportive in every way.

Where did you get the name "Riptano?" Does it mean anything?

We took a sophisticated, augmented AI approach. By which I mean, we took a program that generated random, pronouceable strings, and put together a couple fragments that sounded good together. (This is basically the same approach we took at Mozy, only there Josh insisted on a four letter domain name which narrowed it down a lot.)

I hope it doesn't mean "your dog has bad breath" somewhere.

And yes, Riptano is on twitter.

Are you hiring?

Yes. We'll have a jobs page on the site soon. In the meantime you can email me a resume if you can't wait. Prior participation in the Apache Cassandra project is of course a huge plus.

Wednesday, April 07, 2010

Cassandra: Fact vs fiction

Cassandra has seen some impressive adoption success over the past months, leading some to conclude that Cassandra is the frontrunner in the highly scalable databases space (a subset of the hot NoSQL category). Among all the attention, some misunderstandings have been propagated, which I'd like to clear up.

Fiction: "Cassandra relies on high-speed fiber between datacenters" and can't reliably replicate between datacenters with more than a few ms of latency between them.

Fact: Cassandra's multi-datacenter replication is one of its earliest features and is by far the most battle-tested in the NoSQL space. Facebook had Cassandra deployed on east and west coast datacenters since before open sourcing it. SimpleGeo's Cassandra cluster spans 3 EC2 availability zones, and Digg is also deployed on both coasts. Claims that this can't possibly work are an excellent sign that you're reading an article by someone who doesn't know what he's talking about.

Fiction: "It’s impossible to tell when [Cassandra] replicas will be up-to-date."

Fact: Cassandra provides consistency when R + W > N (read replica count + write replica count > replication factor), to use the Dynamo vocabulary. If you do writes and reads both with QUORUM, for one example, you can expect data consistency as soon as there are enough reachable nodes for a quorum. Cassandra also provides read repair and anti-entropy, so that even reads at ConsistencyLevel.ONE will be consistent after either of these events.

Fiction: Cassandra has a small community

Fact: Although popularity has never been a good metric for determining correctness, it's true that when using bleeding edge technology, it's good to have company. As I write this late at night (in the USA), there are 175 people in the Cassandra irc channel, 60 in the HBase one, 32 in Riak's, and 15 in Voldemort's. (Six months ago, the numbers were 90, 45, and 12 for Cassandra, HBase, and Voldemort. I did not hang out in #riak yet then.) Mailing list participation tells a similar story.

It's also interesting that the creators of Thrudb and dynomite are both using Cassandra now, indicating that the predicted NoSQL consolidation is beginning.

Fiction: "Cassandra only supports one [keyspace] per install."

Fact: This has not been true for almost a year (June of 2009).

Fiction: Cassandra cannot support Hadoop, or supporting tools such as Pig.

Fact: It has always been straightforward to send the output of Hadoop jobs to Cassandra, and Facebook, Digg, and others have been using Hadoop like this as a Cassandra bulk-loader for over a year. For 0.6, I contributed a Hadoop InputFormat and related code to let Hadoop jobs process data from Cassandra as well, while cooperating with Hadoop to keep processing on the nodes that actually hold the data. Stu Hood then contributed a Pig LoadFunc, also in 0.6.

Fiction: Cassandra achieves its high performance by sacrificing reliability (alternately phrased: Cassandra is only good for data you can afford to lose)

Fact: unlike some NoSQL databases (notably MongoDB and HBase), Cassandra offers full single-server durability. Relying on replication is not sufficient for can't-afford-to-lose-data scenarios; if your data center loses power, you are highly likely to lose data if you are not syncing to disk no matter how many replicas you have, and if you run large systems in production long enough, you will realize that power outages through some combination of equipment failure and human error are not occurrences you can ignore. But with its fsync'd commitlog design, Cassandra can protect you against that scenario too.

What to do after your data is saved, e.g. backups and snapshots, is outside of my scope here but covered in the operations wiki page.