<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-11683713</id><updated>2012-01-22T18:05:02.970-08:00</updated><title type='text'>Spyced</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default?start-index=101&amp;max-results=100'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>216</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-11683713.post-4842317383598100222</id><published>2011-11-19T09:24:00.001-08:00</published><updated>2011-11-19T09:27:39.134-08:00</updated><title type='text'>On applying for jobs</title><content type='html'>A friend &lt;a href="http://yamanin.livejournal.com/239406.html"&gt;asks&lt;/a&gt;,

&lt;blockquote&gt;If [I see a] job I could do, even though I don't meet the stated requirements, should I apply anyway? &lt;/blockquote&gt;Short answer: yes.

&lt;p&gt;
Longer  answer: companies are all over the map here, although in general the  less layers of bureaucracy there are between the team that the candidate  will work with and the hiring process, the more likely the list of  requirements is to be actual requirements.

&lt;/p&gt;&lt;p&gt;
How can you tell?

&lt;/p&gt;&lt;p&gt;
HR  paper pushers like to think in terms of checklists because that lets  them go through hundreds of resumes without any real understanding of  the position, so they write ads like &lt;a href="http://jobview.monster.com/Lead-Software-Engineer-Java-J2EE-Oracle-MQ-Job-MetroWest-MA-US-103927164.aspx"&gt;this one&lt;/a&gt; -- lots of really specific "5+ years of X," not much about what the position actually involves.

&lt;/p&gt;&lt;p&gt;
But if it's the team lead himself writing the description, which you will see at smaller companies, then you get much &lt;a href="http://www.linkedin.com/jobs?viewJob=&amp;amp;jobId=2066287"&gt;more about what the position involves&lt;/a&gt;  and less checklist items, because the lead is comfortable determining  competence based on skill instead of pattern matching.  For a software  development position, I don't care if you have a degree in CS if you can  code.  (Open-source contributions are a better signal for ability and  passion than a degree, anyway.)  My team has people with no degree, to  people with PhDs.

&lt;/p&gt;&lt;p&gt;
Even when dealing with large companies, you  have to factor in that people are terrible at distinguishing "want" from  "need."  A lot of "requirements" are really "nice-to-haves." It can be  tough to tell the difference, but the better idea you have of what the  job actually involves, the better you can tell which are hard  requirements.

&lt;/p&gt;&lt;p&gt;
For instance: without knowing anything else about a position, my guess  is that "native French speaker" really would be a hard requirement.   That's not the sort of thing people tend to put down on a whim.  But  even then, there are shades of grey.  For instance, if I were looking  for a job and found a "distributed databases developer position, must  know Java, be familiar with open source and be a native French speaker"  then I might see if they'd give me a pass on the last part because I'm a  &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; good fit for the rest -- and I know they're unlikely to find a lot of candidates with an &lt;span style="font-style: italic;"&gt;exact&lt;/span&gt; match.

&lt;/p&gt;&lt;p&gt;
In  short, you have little to lose by trying, but don't just shotgun out  resumes; include a cover letter that  highlights the best matches from your experience to what they are  looking for.   Follow up with the hiring manager if possible to ask (a)  "I sent in my resume a few days ago, and I wanted to see where you are  in the hiring process for this position," and if they reply that they  got it but you're not a good fit, ask (b) what specifically they were  looking for, so you can flesh out your intuition that much more for next  time.

&lt;/p&gt;&lt;p&gt;
Good luck!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4842317383598100222?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4842317383598100222/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4842317383598100222' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4842317383598100222'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4842317383598100222'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2011/11/on-applying-for-jobs.html' title='On applying for jobs'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6143927286870223608</id><published>2011-01-04T11:53:00.000-08:00</published><updated>2011-10-13T15:30:01.428-07:00</updated><title type='text'>Apache Cassandra: 2010 in review</title><content type='html'>&lt;p&gt;
In 2010, Apache Cassandra increased its momentum as the leading scalable database.  Here is a summary of the notable activity in three areas: code, community and controversy. As always, comments are welcome.

&lt;/p&gt;&lt;p&gt;
&lt;/p&gt;&lt;h3&gt;Code&lt;/h3&gt;

&lt;p&gt;
2010 started with the release of &lt;a href="http://spyced.blogspot.com/2010/01/cassandra-05.html"&gt;Cassandra 0.5&lt;/a&gt;, followed by &lt;a href="http://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces3"&gt;0.6 and graduation from the ASF incubator&lt;/a&gt; a few months later.  Seven more stable releases of 0.6 proceeded, adding &lt;a href="http://www.riptano.com/docs/0.6/appendix/appendix_a_whats_new"&gt;many features&lt;/a&gt; to improve operations in response to feedback from production users.

&lt;/p&gt;&lt;p&gt;
0.7 adds highly anticipated features like &lt;a href="http://www.riptano.com/blog/www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes"&gt;column value indexes&lt;/a&gt;, &lt;a href="http://www.riptano.com/blog/whats-new-cassandra-07-live-schema-updates"&gt;live schema updates&lt;/a&gt;, more efficient cluster expansion, and more control over replication, but didn't quite make it into 2010, with rc4 &lt;a href="http://twitter.com/#%21/cassandra/status/21268489612296192"&gt;released on new year's 2011&lt;/a&gt;.

&lt;/p&gt;&lt;p&gt;
We also committed the &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-1072"&gt;distributed counters&lt;/a&gt; patchset, begun at Digg and enhanced by Twitter for their &lt;a href="http://mashable.com/2010/09/23/twitter-real-time-analytics/"&gt;real-time analytics product&lt;/a&gt;.  Notable as the most-involved feature discussion to date, distributed counters started with a &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-580"&gt;vector clock approach&lt;/a&gt;, but switched to a &lt;a href="https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf"&gt;new design&lt;/a&gt; by &lt;a href="http://twitter.com/#%21/kelvin"&gt;Kelvin Kakugawa&lt;/a&gt; after we realized vector clocks were a &lt;a href="http://pl.atyp.us/wordpress/?p=2601"&gt;dead end&lt;/a&gt; for anything but the trivial case of monotonic-increments-by-one.

&lt;/p&gt;&lt;p&gt;
One of the biggest trends was increasing activity &lt;i&gt;around&lt;/i&gt; Cassandra as well as in the core database itself.  2010 saw &lt;a href="http://wiki.apache.org/cassandra/HadoopSupport"&gt;Hadoop map/reduce integration&lt;/a&gt;, as well as Pig support and a &lt;a href="https://issues.apache.org/jira/browse/HIVE-1434"&gt;patch for Hive&lt;/a&gt;.

&lt;/p&gt;&lt;p&gt;
We also saw &lt;a href="http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/"&gt;Lucandra&lt;/a&gt;, which implements a Cassandra back end for Lucene and is used in several high volume production sites, grow up into &lt;a href="https://github.com/tjake/Lucandra"&gt;Solandra&lt;/a&gt;, embedding Solr and Cassandra in the same JVM for even more performance.

&lt;/p&gt;&lt;p&gt;
&lt;/p&gt;&lt;h3&gt;Community&lt;/h3&gt;

&lt;p&gt;
Cassandra hit its stride in 2010, starting with &lt;a href="http://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces3"&gt;graduation from the ASF incubator&lt;/a&gt; in April.  2010 saw 1025 &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA"&gt;tickets&lt;/a&gt; resolved, nearly twice as many compared to 2009 (565).

&lt;/p&gt;&lt;p&gt;
Like many Apache projects, Cassandra has a relatively small set of &lt;a href="http://wiki.apache.org/cassandra/Committers"&gt;committers&lt;/a&gt;, but a much larger group of contributors.  In 2010 Cassandra passed &lt;a href="http://pastebin.com/DjXhVn2g"&gt;over 100 people&lt;/a&gt; who have contributed at least one patch.  Release manager &lt;a href="http://blog.sym-link.com/"&gt;Eric Evans&lt;/a&gt; put together a great way to visual this with a &lt;a href="http://www.youtube.com/watch?v=FWSyoXnWsTQ"&gt;Code Swarm video of Cassandra development&lt;/a&gt;.

&lt;/p&gt;&lt;p&gt;
I &lt;a href="http://spyced.blogspot.com/2010/04/and-now-for-something-completely.html"&gt;started Riptano&lt;/a&gt; with Matt Pfeil in April to provide professional products and services around Cassandra.  In October, we announced &lt;a href="http://www.riptano.com/blog/cassandra-investment-lightspeed-sequoia"&gt;funding from Lightspeed and Sequoia&lt;/a&gt;.  From May to December, we conducted eleven &lt;a href="http://www.eventbrite.com/org/474011012"&gt;Cassandra training&lt;/a&gt; events in eight months, and twice that many private classes on-site with customers.

&lt;/p&gt;&lt;p&gt;
Riptano is now up to 25 employees, with offices in the San Francisco bay area, Austin, and New York, and engineers working remotely in San Antonio, France, and Belarus.

&lt;/p&gt;&lt;p&gt;
In August, Riptano and Rackspace organized a very successful inaugural &lt;a href="http://www.riptano.com/blog/cassandra-summit-recap"&gt;Cassandra Summit&lt;/a&gt;, with about 200 attendees (&lt;a href="http://www.riptano.com/blog/slides-and-videos-cassandra-summit-2010"&gt;videos available&lt;/a&gt;), followed by &lt;a href="http://us.apachecon.com/c/acna2010/schedule/grid"&gt;almost a full track at ApacheCon&lt;/a&gt; in November. Cassandra was also represented at many other conferences on &lt;a href="http://en.oreilly.com/rails2010/public/schedule/detail/14740"&gt;multiple&lt;/a&gt; &lt;a href="http://www.inf.unibz.it/krdb/school/2010/program.html"&gt;subjects&lt;/a&gt;, &lt;a href="http://my.javaonedevelop.com/events/a2z/JAVAONE"&gt;for&lt;/a&gt; &lt;a href="http://www.slideshare.net/aaronmorton/b-5857745"&gt;several&lt;/a&gt; &lt;a href="http://www.slideshare.net/supertom/using-cassandra-with-your-web-application"&gt;languages&lt;/a&gt;, &lt;a href="http://www.devoxx.com/display/Devoxx2K10/Introduction+to+Cassandra"&gt;and&lt;/a&gt; &lt;a href="http://www.gemini-bigdata.com/2010/11/brief-reviews-of-nosql-afternoon-in.html"&gt;continents&lt;/a&gt;.

&lt;/p&gt;&lt;p&gt;
&lt;/p&gt;&lt;h3&gt;Controversy&lt;/h3&gt;

&lt;p&gt;
Cassandra got a lot of negative publicity when Kevin Rose &lt;a href="http://gilhildebrand.com/afterthought/2010/09/kevin-rose-spreads-fud-blames-cassandra-for-digg-v4-woes/"&gt;blamed&lt;/a&gt; Cassandra for Digg v4's teething problems.  However, there was no deluge of &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA"&gt;bug reports&lt;/a&gt; coming out of Digg's Cassandra team, and Digg engineers Arin Sarkissian and Chris Goffinet (now working on Cassandra for Twitter) got on Quora &lt;a href="http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures"&gt;to refute the idea that Cassandra was at fault&lt;/a&gt;:

&lt;/p&gt;&lt;blockquote&gt;
The whole "Cassandra to blame" thing is 100% a result of folks clinging on to the NoSQL vs SQL thing. It's a red herring.

&lt;p&gt;
The new version of Digg has a whole new architecture with a bunch of technologies involved. Problem is, over the last few months or so the only technological change we mentioned (blogged about etc) was Cassandra. That made it pretty easy for folks to cling on to it as the "problem".
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;
Meanwhile, Digg competitor Reddit &lt;a href="http://twitter.com/#%21/ketralnis/status/658776965255168"&gt;has&lt;/a&gt; &lt;a href="http://twitter.com/#%21/ketralnis/status/10098518563758080"&gt;continued&lt;/a&gt; &lt;a href="http://twitter.com/#%21/jedberg/status/10401811135463424"&gt;migrating&lt;/a&gt; to Cassandra, crediting it with &lt;a href="http://www.reddit.com/r/blog/comments/evmek/2010_we_hardly_knew_ye/c1bbmrq"&gt;enabling their 3x traffic growth in 2010&lt;/a&gt;.

&lt;/p&gt;&lt;p&gt;
More importantly, 2010 saw dozens of new Cassandra deployments, including a new contender for the largest-cluster crown when Digital Reasoning announced a &lt;a href="http://www.businesswire.com/news/home/20101006005485/en/Digital-Reasoning-Riptano-Advance-Cassandra-Based-Analytic-Solutions"&gt;400-node cluster for the US government.&lt;/a&gt;

&lt;/p&gt;&lt;p&gt;
We look forward to another great year in 2011!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6143927286870223608?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6143927286870223608/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6143927286870223608' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6143927286870223608'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6143927286870223608'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2011/01/apache-cassandra-2010-in-review.html' title='Apache Cassandra: 2010 in review'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4980279696596031243</id><published>2010-04-26T12:29:00.000-07:00</published><updated>2010-05-23T18:51:15.223-07:00</updated><title type='text'>And now for something completely different</title><content type='html'>&lt;p&gt;
A month ago I left Rackspace to start &lt;a href="http://riptano.com/"&gt;Riptano&lt;/a&gt;, a &lt;a href="http://riptano.com/services.php"&gt;Cassandra support and services&lt;/a&gt; company.

&lt;/p&gt;&lt;p&gt;
I was in the unusal position of being a technical person looking for a business-savvy co-founder.  For whatever reason, the converse seems a lot &lt;a href="http://answers.onstartups.com/questions/35/how-do-i-find-a-technical-co-founder"&gt;more&lt;/a&gt; &lt;a href="http://www.mattcollins.net/2008/04/how-to-find-a-technical-co-founder"&gt;common&lt;/a&gt;.  Maybe technical people tend to sterotype softer skills as being easy.

&lt;/p&gt;&lt;p&gt;
But despite some examples to the contrary (notably for me, &lt;a href="http://www.jcoates.org/"&gt;Josh Coates&lt;/a&gt; at &lt;a href="http://spyced.blogspot.com/2005/09/python-at-mozycom.html"&gt;Mozy&lt;/a&gt;), I found that &lt;a href="http://www.paulgraham.com/startupmistakes.html"&gt;starting a company is too hard for just one person&lt;/a&gt;.  Unfortunately, all of my fairly slim portfolio of business guys I'd like to co-found with were unavailable.  So progress was slow, until &lt;a href="http://www.linkedin.com/pub/matt-pfeil/19/71/201"&gt;Matt Pfeil&lt;/a&gt; heard that I was leaving Rackspace and drove to San Antonio from Austin to talk me out of it.  Not only was he not successful in talking me out of leaving, but he ended up co-founding Riptano.  And here we are, with a Riptano mini-faq.

&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Isn't Cassandra mostly just a web 2.0 thing for ex-mysql shops?&lt;/b&gt;

&lt;/p&gt;&lt;p&gt;
Although most of the &lt;a href="http://spyced.blogspot.com/2010/03/cassandra-in-action.html"&gt;early adopters&lt;/a&gt; fit this stereotype, we're seeing interest from a lot of Oracle users and a lot of industries.  Unlike many "NoSQL" databases, Cassandra &lt;a href="http://spyced.blogspot.com/2010/04/cassandra-fact-vs-fiction.html"&gt;doesn't drop durability&lt;/a&gt; (the D in &lt;a href="http://en.wikipedia.org/wiki/ACID"&gt;ACID&lt;/a&gt;), and besides scalability, enterprises are very interested in our support for multiple data centers and &lt;a href="http://wiki.apache.org/cassandra/HadoopSupport"&gt;Hadoop analytics&lt;/a&gt;.

&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Are you going to fork Cassandra?&lt;/b&gt;

&lt;/p&gt;&lt;p&gt;
No.  Although the ASF license allows doing basically anything with the code, including creating proprietary forks, we think the track record of this strategy in the open source database world is &lt;a href="http://jcole.us/blog/archives/2007/08/09/mysql-community-split-officially-a-failure/"&gt;mixed&lt;/a&gt; at best.

&lt;/p&gt;&lt;p&gt;
We might create a (still open-source) Cassandra distribution similar to &lt;a href="http://www.cloudera.com/hadoop/"&gt;Cloudera's Distribution for Hadoop&lt;/a&gt;, but the mainline Cassandra development is responsive enough that there isn't as much need for a third party to do this as there is with Hadoop.

&lt;/p&gt;&lt;p&gt;
&lt;b&gt;What does Rackspace think?&lt;/b&gt;

&lt;/p&gt;&lt;p&gt;
&lt;a href="http://rackspace.com/"&gt;Rackspace&lt;/a&gt; has been the primary driver of Cassandra development recently, employing (until I left) the three most active &lt;a href="http://wiki.apache.org/cassandra/Committers"&gt;committers on the project&lt;/a&gt;.  For the same reasons &lt;a href="http://www.rackspacecloud.com/blog/2009/09/23/the-cassandra-project/"&gt;Rackspace supported Cassandra&lt;/a&gt; to begin with, Rackspace is excited to see Riptano help take the Cassandra ecosystem to the next level.  &lt;a href="http://www.informationweek.com/news/hardware/virtual/showArticle.jhtml?articleID=224600336"&gt;Rackspace has invested in Riptano&lt;/a&gt; and has been completely supportive in every way.

&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Where did you get the name "Riptano?"  Does it mean anything?&lt;/b&gt;

&lt;/p&gt;&lt;p&gt;
We took a sophisticated, augmented AI approach.  By which I mean, we took a &lt;a href="http://www.multicians.org/thvv/gpw.html"&gt;program that generated random, pronouceable strings&lt;/a&gt;, and put together a couple fragments that sounded good together.  (This is basically the same approach we took at Mozy, only there Josh insisted on a four letter domain name which narrowed it down a &lt;i&gt;lot&lt;/i&gt;.)

&lt;/p&gt;&lt;p&gt;
I hope it doesn't mean "your dog has bad breath" somewhere.

&lt;/p&gt;&lt;p&gt;
And yes, &lt;a href="http://twitter.com/riptano"&gt;Riptano is on twitter&lt;/a&gt;.

&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Are you hiring?&lt;/b&gt;

&lt;/p&gt;&lt;p&gt;
Yes.  We'll have a jobs page on the site soon.  In the meantime you can email me a resume if you can't wait.  Prior participation in the Apache Cassandra project is of course a huge plus.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4980279696596031243?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4980279696596031243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4980279696596031243' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4980279696596031243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4980279696596031243'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/04/and-now-for-something-completely.html' title='And now for something completely different'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7444281061692094261</id><published>2010-04-07T09:37:00.000-07:00</published><updated>2010-04-20T06:17:45.056-07:00</updated><title type='text'>Cassandra: Fact vs fiction</title><content type='html'>&lt;p&gt;
&lt;a href="http://cassandra.apache.org/"&gt;Cassandra&lt;/a&gt; has seen some &lt;a href="http://spyced.blogspot.com/2010/03/cassandra-in-action.html"&gt;impressive adoption success&lt;/a&gt; over the past months, leading some to conclude that &lt;a href="http://blog.tonybain.com/tony_bain/2009/12/is-cassandra-winning-the-nosql-race.html"&gt;Cassandra is the frontrunner&lt;/a&gt; in the highly scalable databases space (a subset of the hot &lt;a href="http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/"&gt;NoSQL category&lt;/a&gt;).  Among all the attention, some misunderstandings have been propagated, which I'd like to clear up.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fiction&lt;/span&gt;: "Cassandra relies on high-speed fiber between datacenters" and can't reliably replicate between datacenters with more than a few ms of latency between them.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fact&lt;/span&gt;: Cassandra's multi-datacenter replication is one of its earliest features and is by far the most battle-tested in the NoSQL space.  Facebook had Cassandra deployed on east and west coast datacenters since before open sourcing it.  SimpleGeo's Cassandra cluster &lt;a href="http://permalink.gmane.org/gmane.comp.db.cassandra.user/3462"&gt;spans 3 EC2 availability zones&lt;/a&gt;, and Digg is also deployed on both coasts.  Claims that this can't possibly work are an excellent sign that you're reading an article by someone who doesn't know what he's talking about.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fiction&lt;/span&gt;: "It’s impossible to tell when [Cassandra] replicas will be up-to-date."

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fact&lt;/span&gt;: Cassandra provides consistency when R + W &gt; N (read replica count + write replica count &gt; replication factor), to use the &lt;a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html"&gt;Dynamo vocabulary&lt;/a&gt;.  If you do writes and reads both with QUORUM, for one example, you can expect data consistency as soon as there are enough reachable nodes for a quorum.  Cassandra also provides &lt;a href="http://wiki.apache.org/cassandra/ReadRepair"&gt;read repair&lt;/a&gt; and &lt;a href="http://wiki.apache.org/cassandra/AntiEntropy"&gt;anti-entropy&lt;/a&gt;, so that even reads at &lt;a href="http://wiki.apache.org/cassandra/API"&gt;ConsistencyLevel.ONE&lt;/a&gt; will be consistent after either of these events.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fiction&lt;/span&gt;: Cassandra has a small community

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fact&lt;/span&gt;: Although popularity has never been a good metric for determining correctness, it's true that when using bleeding edge technology, it's good to have company.  As I write this late at night (in the USA), there are 175 people in the Cassandra irc channel, 60 in the HBase one, 32 in Riak's, and 15 in Voldemort's.  (Six months ago, the numbers were 90, 45, and 12 for Cassandra, HBase, and Voldemort.  I did not hang out in #riak yet then.)  Mailing list participation tells a similar story.

&lt;p&gt;
It's also interesting that the creators of &lt;a href="http://code.google.com/p/thrudb/"&gt;Thrudb&lt;/a&gt; and &lt;a href="http://github.com/cliffmoon/dynomite"&gt;dynomite&lt;/a&gt; are both using Cassandra now, indicating that the predicted NoSQL consolidation is beginning.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fiction&lt;/span&gt;: "Cassandra only supports one [keyspace] per install."

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fact&lt;/span&gt;: This has not been true for almost a year (&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-79"&gt;June of 2009&lt;/a&gt;).

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fiction&lt;/span&gt;: Cassandra cannot support Hadoop, or supporting tools such as Pig.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fact&lt;/span&gt;: It has always been straightforward to send the output of Hadoop jobs to Cassandra, and Facebook, Digg, and others have been using Hadoop like this as a Cassandra bulk-loader for over a year.  For 0.6, I contributed a Hadoop InputFormat and related code to let Hadoop jobs &lt;a href="http://wiki.apache.org/cassandra/HadoopSupport"&gt;process data &lt;span style="font-style: italic;"&gt;from&lt;/span&gt; Cassandra&lt;/a&gt; as well, while cooperating with Hadoop to keep processing on the nodes that actually hold the data.  Stu Hood then contributed a Pig LoadFunc, also in 0.6.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fiction&lt;/span&gt;: Cassandra achieves its high performance by sacrificing reliability (alternately phrased: Cassandra is only good for data you can afford to lose)

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Fact&lt;/span&gt;: unlike some NoSQL databases (notably &lt;a href="http://blog.mongodb.org/post/381927266/what-about-durability"&gt;MongoDB&lt;/a&gt; and &lt;a href="http://www.slideshare.net/cloudera/hbase-user-group-9-hbase-and-hdfs"&gt;HBase&lt;/a&gt;), Cassandra offers &lt;a href="http://wiki.apache.org/cassandra/Durability"&gt;full single-server durability&lt;/a&gt;.  Relying on replication is not sufficient for can't-afford-to-lose-data scenarios; if your data center loses power, you are highly likely to lose data if you are not syncing to disk no matter how many replicas you have, and if you run large systems in production long enough, you will realize that power outages through some combination of equipment failure and human error are not occurrences you can ignore.  But with its &lt;a href="http://linux.die.net/man/2/fsync"&gt;fsync&lt;/a&gt;'d &lt;a href="http://wiki.apache.org/cassandra/ArchitectureCommitLog"&gt;commitlog&lt;/a&gt; design, Cassandra can protect you against that scenario too.

&lt;/p&gt;&lt;p&gt;
What to do after your data is saved, e.g. backups and snapshots, is outside of my scope here but covered in the &lt;a href="http://wiki.apache.org/cassandra/Operations"&gt;operations wiki page&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7444281061692094261?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7444281061692094261/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7444281061692094261' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7444281061692094261'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7444281061692094261'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/04/cassandra-fact-vs-fiction.html' title='Cassandra: Fact vs fiction'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2135107228606048521</id><published>2010-03-30T07:48:00.001-07:00</published><updated>2011-11-01T09:19:19.137-07:00</updated><title type='text'>Cassandra in Google Summer of Code 2010</title><content type='html'>&lt;p&gt;
Cassandra is participating in the Google Summer of Code, which &lt;a href="http://google-opensource.blogspot.com/2010/03/students-apply-now-for-google-summer-of.html"&gt;opened for proposal submission today&lt;/a&gt;.  Cassandra is part of the Apache Software Foundation, which has &lt;a href="http://community.apache.org/gsoc.html"&gt;its own page of guidelines up&lt;/a&gt; for students and mentors.

&lt;/p&gt;&lt;p&gt;
We have a good mix of project ideas involving both core and non-core areas, from straightforward code bashing to some pretty tricky stuff, depending on your appetite.  Core tickets aren't necessarily harder than non-core, but they will require reading and understanding more existing code.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;font-size:130%;" &gt;Non-core&lt;/span&gt;
&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-918"&gt;Create a web ui for cassandra&lt;/a&gt;: we have a (fairly minimal) command line interface, but a web gui is more user-friendly.  There is the beginnings of such a beast in the Cassandra source tree at contrib/cassandra_browser [pretty ugly Python code] and a gtk-based one at http://github.com/driftx/chiton [also Python, less ugly].&lt;/li&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-912"&gt;First-class commandline interface&lt;/a&gt;: if you prefer to kick things old-school, improving the cli itself would also be welcome.&lt;/li&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-873"&gt;Create a Cassandra demo application&lt;/a&gt;: we have &lt;a href="http://twissandra.com/"&gt;Twissandra&lt;/a&gt;, but we can always use more examples to introduce people to "thinking in Casssandra," which is the hardest part of using it.  This one seems to be the most popular with students so far.  (So stand out from the crowd, and submit something else too. :)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;
&lt;span style="font-weight: bold;font-size:130%;" &gt;Almost-core
&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-875"&gt;Performance regression tests&lt;/a&gt;: pretty self-explanatory?&lt;/li&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-874"&gt;System tests against multiple nodes&lt;/a&gt;: If GSOC were a wish-granting fairy I would probably choose this with my first wish.  There's a couple different ways you can approach this; scripting VMs is one, or you could explore the &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-561"&gt;Cassandra simulator&lt;/a&gt; that was contributed a while ago (some TLC required).&lt;/li&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-913"&gt;Hive support&lt;/a&gt;: &lt;a href="http://hadoop.apache.org/hive/"&gt;Hive&lt;/a&gt; is a project that runs SQL queries against Hadoop map/reduce clusters.   (For analytics; it is too high-latency to run applications against Hive  directly).  &lt;a href="https://issues.apache.org/jira/browse/HIVE-705" title="Let Hive can analyse hbase's tables"&gt;&lt;strike&gt;HIVE-705&lt;/strike&gt;&lt;/a&gt;  added support for backends other than HDFS, with HBase as the first.   Cassandra support should be doable too now.  The Hive storage backends are described in &lt;a href="http://wiki.apache.org/hadoop/Hive/StorageHandlers"&gt;http://wiki.apache.org/hadoop/Hive/StorageHandlers&lt;/a&gt;  and the HBase backend specifically in &lt;a href="http://wiki.apache.org/hadoop/Hive/HBaseIntegration"&gt;http://wiki.apache.org/hadoop/Hive/HBaseIntegration&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;
&lt;span style="font-weight: bold;font-size:130%;" &gt;Core&lt;/span&gt;
&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-926"&gt;Avro RPC support&lt;/a&gt;: currently Cassandra's client layer is the Thrift RPC framework, which sucks for reasons outside our scope here.  We're moving to Avro, the new hotness from Doug Cutting (creator of Lucene and Hadoop, you may have heard of those).  Basically this means porting org.apache.cassandra.thrift.CassandraServer to org.apache.cassandra.avro.CassandraServer; some examples are already done by Eric Evans.&lt;/li&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-876"&gt;Session-level consistency&lt;/a&gt;: In &lt;a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html"&gt;one&lt;/a&gt; and &lt;a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html"&gt;two&lt;/a&gt; Amazon discusses the concept of "eventual consistency." Cassandra uses eventual consistency in a design similar to Dynamo.  Supporting session consistency would be useful and relatively easy to add: we already have the concept of a &lt;a href="http://wiki.apache.org/cassandra/MemtableSSTable"&gt;Memtable&lt;/a&gt; to "stage" updates in before flushing to disk; if we applied mutations to a session-level memtable on the coordinator machine (that is, the machine the client is connected to), and then did a final merge from that table against query results before handing them to the client, we'd get it almost for free.&lt;/li&gt;&lt;li&gt;&lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-622"&gt;Optimize commitlog performance&lt;/a&gt;: this is about as low-level as you'll find in Cassandra's code base.  fsync, CAS, it's all here.  &lt;a href="http://wiki.apache.org/cassandra/ArchitectureCommitLog"&gt;http://wiki.apache.org/cassandra/ArchitectureCommitLog&lt;/a&gt;  describes the current CommitLog design. &lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;
You can comment directly on the JIRA tickets after creating an account (it's open to the public) if you're interested or have other questions.  And of course feel free to propose other ideas!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2135107228606048521?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2135107228606048521/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2135107228606048521' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2135107228606048521'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2135107228606048521'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/03/cassandra-in-google-summer-of-code-2010.html' title='Cassandra in Google Summer of Code 2010'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6434289544016263640</id><published>2010-03-24T08:41:00.001-07:00</published><updated>2011-03-14T06:25:21.361-07:00</updated><title type='text'>Cassandra in action</title><content type='html'>&lt;p&gt;
There's been a lot of new articles about &lt;a href="http://cassandra.apache.org/"&gt;Cassandra &lt;/a&gt;deployments in the past month, enough that I thought it would be useful to summarize in a post.
&lt;/p&gt;

&lt;p&gt;
Ryan King explained in &lt;a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king"&gt;an interview with Alex Popescu&lt;/a&gt; why Twitter is moving to Cassandra for tweet storage, and why they selected Cassandra over the alternatives.  My experience is that the more someone understands large systems and the problems you can run into with them from an operational standpoint, the more likely they are to choose Cassandra when doing this kind of evaluation.  Ryan's list of criteria is worth checking out.
&lt;/p&gt;

&lt;p&gt;
Digg followed up their &lt;a href="http://about.digg.com/blog/looking-future-cassandra"&gt;earlier announcement&lt;/a&gt; that they had taken part of their site live on Cassandra with &lt;a href="http://about.digg.com/node/564"&gt;another&lt;/a&gt; saying that they've now "reimplemented most of Digg's functionality using Cassandra as our primary datastore."   Digg engineer Ian Eure also gave &lt;a href="http://news.ycombinator.com/item?id=1184603"&gt;some more details on Digg's cassandra data model&lt;/a&gt; in a Hacker News thread.&lt;/p&gt;

&lt;p&gt;Om Malik &lt;a href="http://gigaom.com/2010/03/11/digg-cassandara/"&gt;quoted extensively&lt;/a&gt; from the Digg announcement and from Rackspace engineer Stu Hood, who explained Cassandra's appeal: "Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure.  When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single row to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values."&lt;/p&gt;

&lt;p&gt;
The Twitter and Digg news kicked off &lt;a href="http://blogsearch.google.com/blogsearch?q=twitter+cassandra"&gt;a lot  of publicity&lt;/a&gt;, including a lot of "me too" articles but some  interesting ones, including a highscalability post wondering if this was  &lt;a href="http://highscalability.com/blog/2010/2/26/mysql-and-memcached-end-of-an-era.html"&gt;the  end of the mysql + memcached era&lt;/a&gt;.  If not quite yet the end, then  the beginning of it.  As Ian Eure from Digg &lt;a href="http://www.rackspacecloud.com/blog/2010/02/25/should-you-switch-to-nosql-too/"&gt;said&lt;/a&gt;,  "If you're deploying memcache on top of your database, you're inventing your own ad-hoc, difficult to maintain NoSQL system."  Possibly the best commentary on this idea is &lt;a href="http://www.25hoursaday.com/weblog/2010/03/10/BuildingScalableDatabasesAreRelationalDatabasesCompatibleWithLargeScaleWebsites.aspx"&gt;Dare Obasanjo's&lt;/a&gt;, who explained "Digg's usage of Cassandra actually serves as a rebuttal to [an article claiming SQL scales just fine] since they couldn't feasibly get what they want with either horizontal or vertical scaling of their relational database-based solution."
&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://blog.reddit.com/2010/03/she-who-entangles-men.html"&gt;Reddit also migrated to Cassandra&lt;/a&gt; from memcachedb, in only 10 days, the fastest migration to Cassandra I've seen.  More comments from the engineer doing the migration, ketralnis, in the &lt;a href="http://www.reddit.com/r/programming/comments/bcqhi/reddits_now_running_on_cassandra/"&gt;reddit discussion thread&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
CloudKick &lt;a href="https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/"&gt;blogged about how they use Cassandra for time series data&lt;/a&gt;, including a sketch of their data model.  CloudKick migrated from PostgreSQL, skewering the theory you will sometimes see proffered that "only MySQL users are migrating to NoSQL, not people who use [my favorite vendor's relational database]."
&lt;/p&gt;

&lt;p&gt;
Jake Luciani wrote about &lt;a href="http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/"&gt;how Lucandra, the Cassandra Lucene back-end works&lt;/a&gt;, and how he's using it to power &lt;a href="http://sparse.ly"&gt;the Twitter search app sparse.ly&lt;/a&gt;.  IMO, &lt;a href="http://github.com/tjake/Lucandra"&gt;Lucandra&lt;/a&gt; is one of Cassandra's killer apps.
&lt;/p&gt;

&lt;p&gt;
The FightMyMonster team &lt;a href="http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/"&gt;switched from HBase to Cassandra&lt;/a&gt; after concluding that "HBase is more suitable for data warehousing, and large scale data processing and analysis... and Cassandra is more suitable for real time transaction processing and the serving of interactive data."  Dominic covers CAP, architecture considerations, benchmarks, map/reduce, and durability in explaining his conclusion.
&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://www.startupmonkeys.com/2010/03/cassandra-frugal-mechanic/"&gt;Eric Peters gave a talk on Cassandra&lt;/a&gt; use at his company, Frugal Mechanic, at the Seattle Tech Startups Meetup.  This was interesting not because Frugal Mechanic is a big name but because it's not.  I haven't seen Eric's name on the Cassandra mailing lists at all, but there he was deploying it and giving a talk on it, showing that Cassandra is starting to move beyond early adopters.  (And, just maybe, that our documentation is improving. :)&lt;/p&gt;

&lt;p&gt;
Finally, &lt;a href="http://www.eflorenzano.com/"&gt;Eric Florenzano&lt;/a&gt; has a live demo up now of Cassandra running a Twitter clone at &lt;a href="http://twissandra.com/"&gt;twissandra.com&lt;/a&gt;, with &lt;a href="http://github.com/ericflo/twissandra"&gt;source&lt;/a&gt; at github, as an example of how to use Cassandra's data model.  If you're interested in the nuts and bolts of how to build an app on Cassandra, you should check it out.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6434289544016263640?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6434289544016263640/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6434289544016263640' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6434289544016263640'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6434289544016263640'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/03/cassandra-in-action.html' title='Cassandra in action'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3844309162894670811</id><published>2010-03-15T15:19:00.000-07:00</published><updated>2010-03-17T08:43:49.079-07:00</updated><title type='text'>Why your data may not belong in the cloud</title><content type='html'>&lt;p&gt;
&lt;a href="http://pl.atyp.us/wordpress/?p=2742"&gt;Several&lt;/a&gt; of the &lt;a href="http://groups.csail.mit.edu/haystack/blog/2010/03/12/notes-from-nosql-live-boston/"&gt;reports&lt;/a&gt; of the recently-concluded NoSQL Live event  mentioned that I took a contrarian position on the "NoSQL in the Cloud"  panel, arguing that traditional, bare metal servers usually make more  sense.  Here's why.

&lt;p&gt;
There are two reasons to use cloud  infrastructure (and by cloud I mean here "commodity VMs such as those  provided by Rackspace Cloud Servers or Amazon EC2):
&lt;ol&gt;&lt;li&gt;You only  need a fraction of the capacity of a single machine&lt;/li&gt;&lt;li&gt;Your demand  is highly elastic; you want to be able to quickly spin up many new  instances, then drop them when you are done&lt;/li&gt;&lt;/ol&gt;Most people looking  at NoSQL solutions are doing it because their data is larger than a  traditional solution can handle, or will be, so (1) is not a very strong  motivation.  But what about (2)?  At first glance, cloud is a great fit  for adding capacity to a database cluster painlessly.  But there's an  important difference between load like web traffic that bounces up and down frequently, and databases: with few exceptions, databases only get larger with time.  You won't have 20 TB of data this week, and 2 next.

&lt;p&gt;
When capacity only grows in one direction it makes less sense to pay a premium for the flexibility of being able to reduce your capacity nearly instantly, especially when you also get reduced I/O performance (the most common bottleneck for databases) in the bargain because of the virtualization layer.  That's why, despite working for a &lt;a href="http://rackspacecloud.com/"&gt;cloud provider&lt;/a&gt;, I don't think it's always a good fit for databases.  (It doesn't hurt that Rackspace also offers classic bare metal hosting in the same data centers, so you can have the best of both worlds.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3844309162894670811?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3844309162894670811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3844309162894670811' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3844309162894670811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3844309162894670811'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/03/why-your-data-may-not-belong-in-cloud.html' title='Why your data may not belong in the cloud'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-1928982814843756737</id><published>2010-02-08T10:06:00.000-08:00</published><updated>2010-02-25T20:39:33.258-08:00</updated><title type='text'>Distributed deletes in the Cassandra database</title><content type='html'>&lt;p&gt;
Handling deletes in a distributed, &lt;a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html"&gt;eventually consistent&lt;/a&gt; system is a little tricky, as demonstrated by the fairly frequent recurrence of the question, "&lt;a href="http://wiki.apache.org/cassandra/FAQ#i_deleted_what_gives"&gt;Why doesn't disk usage immediately decrease when I remove data in Cassandra&lt;/a&gt;?"

&lt;p&gt;
As background, recall that a &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra &lt;/a&gt;cluster defines a ReplicationFactor that determines how many nodes each key and associated columns are written to.  In Cassandra (as in &lt;a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html"&gt;Dynamo&lt;/a&gt;), the client controls how many replicas to block for on writes, which includes deletions.  In particular, the client may (and typically will) specify a ConsistencyLevel of less than the cluster's ReplicationFactor, that is, the coordinating server node should report the write successful even if some replicas are down or otherwise not responsive to the write.

&lt;p&gt;
(Thus, the "eventual" in eventual consistency: if a client reads from a replica that did not get the update with a low enough ConsistencyLevel, it will potentially see old data.  Cassandra uses &lt;a href="http://wiki.apache.org/cassandra/HintedHandoff"&gt;Hinted Handoff&lt;/a&gt;, &lt;a href="http://wiki.apache.org/cassandra/ReadRepair"&gt;Read Repair&lt;/a&gt;, and &lt;a href="http://wiki.apache.org/cassandra/AntiEntropy"&gt;Anti Entropy&lt;/a&gt; to reduce the inconsistency window, as well as offering higher consistency levels such as ConstencyLevel.QUORUM, but it's still something we have to be aware of.)

&lt;p&gt;
Thus, a delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that &lt;span style="font-style: italic;"&gt;did&lt;/span&gt; receive the delete as having missed a write update, and repair them!  So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone.  The tombstone can then be propagated to replicas that missed the initial remove request.

&lt;p&gt;
There's one more piece to the problem: how do we know when it's safe to remove tombstones?  In a fully distributed system, we can't.  We could add a coordinator like &lt;a href="http://hadoop.apache.org/zookeeper/"&gt;ZooKeeper&lt;/a&gt;, but that would pollute the simplicity of the design, as well as complicating ops -- then you'd essentially have two systems to monitor, instead of one.  (This is not to say ZK is bad software -- I believe it is best in class at what it does -- only that it solves a problem that we do not wish to add to our system.)

&lt;p&gt;
So, Cassandra does what distributed systems designers frequently do when confronted with a problem we don't know how to solve: define some additional constraints that turn it into one that we do. Here, we defined a constant, &lt;em&gt;GCGraceSeconds&lt;/em&gt;, and had each node track tombstone age locally.  Once it has aged past the constant, it can be GC'd.  This means that if you have a node down for longer than &lt;em&gt;GCGraceSeconds&lt;/em&gt;, you should treat it as a failed node and replace it as described in &lt;a href="http://wiki.apache.org/cassandra/Operations"&gt;Cassandra Operations&lt;/a&gt;.  The default setting is very conservative, at 10 days; you can reduce that once you have Anti Entropy configured to your satisfaction.  And of course if you are only running a single Cassandra node, you can reduce it to zero, and tombstones will be GC'd at the first &lt;a href="http://wiki.apache.org/cassandra/MemtableSSTable"&gt;compaction&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-1928982814843756737?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/1928982814843756737/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=1928982814843756737' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1928982814843756737'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1928982814843756737'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html' title='Distributed deletes in the Cassandra database'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-540813462044138543</id><published>2010-01-25T14:23:00.000-08:00</published><updated>2010-10-13T06:38:26.199-07:00</updated><title type='text'>Cassandra 0.5.0 released</title><content type='html'>&lt;p&gt;
&lt;a href="http://cassandra.apache.org/"&gt;Apache Cassandra&lt;/a&gt; 0.5.0 was released over the weekend, four months after 0.4.  (&lt;a href="https://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.5.0/NEWS.txt"&gt;Upgrade notes&lt;/a&gt;; &lt;a href="https://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.5.0/CHANGES.txt"&gt;full changelog&lt;/a&gt;.)  We're excited about releasing 0.5 because it makes life even better for people using Cassandra as their primary data source -- as opposed to a replica, possibly denormalized, of data that exists somewhere else.
&lt;/p&gt;&lt;p&gt;
The Cassandra distributed database has always had a commitlog to provide durable writes, and in 0.4 we added an option to waiting for commitlog sync before acknowledging writes, for cases where even a few seconds of potential data loss was not an option.  But what if a node goes down temporarily?  0.5 adds proactive repair, what Dynamo calls "anti-entropy," to synchronize any updates &lt;a href="http://wiki.apache.org/cassandra/HintedHandoff"&gt;Hinted Handoff&lt;/a&gt; or read repair didn't catch across all replicas for a given piece of data.
&lt;/p&gt;&lt;p&gt;
0.5 also adds load balancing and significantly improves bootstrap (adding nodes to a running cluster).  We've also been busy adding documentation on &lt;a href="http://wiki.apache.org/cassandra/Operations"&gt;operations in production&lt;/a&gt; and &lt;a href="http://wiki.apache.org/cassandra/ArchitectureInternals"&gt;system internals&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
Finally, in 0.5 we've improved concurrency across the board, improving insert speed by over 50% on the stress.py benchmark (from contrib/) on a relatively modest 4-core system with 2GB of ram.  We've also added a [row] key cache, enabling similar relative improvements in reads:
&lt;/p&gt;&lt;p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_bwSkwFkEnF0/S14o4J02X7I/AAAAAAAAAIA/Uhra2UAmP5g/s1600-h/cassandra+04+vs+05+single+machine.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 160px;" src="http://3.bp.blogspot.com/_bwSkwFkEnF0/S14o4J02X7I/AAAAAAAAAIA/Uhra2UAmP5g/s400/cassandra+04+vs+05+single+machine.png" alt="" id="BLOGGER_PHOTO_ID_5430823145830768562" border="0" /&gt;&lt;/a&gt;(You will note that unlike most systems, Cassandra &lt;a href="http://wiki.apache.org/cassandra/FAQ#reads_slower_writes"&gt;reads are usually slower than writes&lt;/a&gt;.  0.6 will narrow this gap with full row caching and mmap'd I/O, but fundamentally we think optimizing for writes is the right thing to do since writes have always been harder to scale.)
&lt;/p&gt;&lt;p&gt;
Log replay, flush, compaction, and range queries are also faster.
&lt;/p&gt;&lt;p&gt;
0.5 also brings new tools, including JSON-based data export and import, an improved command-line interface, and new JMX metrics.
&lt;/p&gt;&lt;p&gt;
One final note: like all distributed systems, Cassandra is designed to maximize throughput when under load from many clients.  Benchmarking with a single thread or a small handful will not give you numbers representative of production (unless you only ever have four or five users at a time in production, I suppose).  Please don't ask "why is Cassandra so slow" and offer up a single-threaded benchmark as evidence; that makes me sad inside. Here's 1000 words:

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_bwSkwFkEnF0/S14pF3hzsUI/AAAAAAAAAII/nuA7KSoLPCg/s1600-h/cassandra-inserts-vs-threads.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 160px;" src="http://3.bp.blogspot.com/_bwSkwFkEnF0/S14pF3hzsUI/AAAAAAAAAII/nuA7KSoLPCg/s400/cassandra-inserts-vs-threads.png" alt="" id="BLOGGER_PHOTO_ID_5430823381437231426" border="0" /&gt;&lt;/a&gt;

&lt;/p&gt;&lt;p&gt;
(Thanks to &lt;a href="http://twitter.com/faltering"&gt;Brandon Williams&lt;/a&gt; for the graphs.)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-540813462044138543?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/540813462044138543/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=540813462044138543' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/540813462044138543'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/540813462044138543'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/01/cassandra-05.html' title='Cassandra 0.5.0 released'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_bwSkwFkEnF0/S14o4J02X7I/AAAAAAAAAIA/Uhra2UAmP5g/s72-c/cassandra+04+vs+05+single+machine.png' height='72' width='72'/><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3952048496501613799</id><published>2010-01-21T08:22:00.000-08:00</published><updated>2010-05-17T05:17:31.953-07:00</updated><title type='text'>Linux performance basics</title><content type='html'>&lt;p&gt;
I want to write about &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt; performance tuning, but first I need to cover some basics: how to use vmstat, iostat, and top to understand what part of your system is the bottleneck -- not just for Cassandra but for any system.

&lt;/p&gt;&lt;p&gt;
&lt;/p&gt;&lt;div style="font-size: 130%;"&gt;&lt;b&gt;vmstat&lt;/b&gt;&lt;/div&gt;
You will typically run vmstat with "vmstat sampling-period", e.g., "vmstat 5."  The output looks like this:

&lt;p&gt;&lt;/p&gt;&lt;pre class="code"&gt;
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
20  0 195540  32772   6952 576752    0    0    11    12   38   43  1  0 99  0
22  2 195536  35988   6680 575132    6    0  2952    14  959 16375 72 21  4  3
&lt;/pre&gt;

The first line is your total system average since boot; typically this will not be very useful, since you are interested in what is causing problems NOW.  Then you will get one line per sample period; most of the output is self explanatory.  The reason to start with vmstat is the "swap" section: si and so are swap in (memory read from disk) and swap out (memory written to disk).  Remember that a little swapping is normal, particularly during application startup: by default, Linux will swap infrequently used pages of application memory to disk to free up more room for disk caching, &lt;a href="http://lwn.net/Articles/83588/"&gt;even if there is enough ram&lt;/a&gt; to accommodate all applications.

&lt;p&gt;
&lt;/p&gt;&lt;div style="font-size: 130%;"&gt;&lt;b&gt;iostat&lt;/b&gt;&lt;/div&gt;
To get more details of io, use iostat -x.  Again, you want to give it a sampling interval, and ignore the first set of output.  iostat also gives you some cpu information but top does that better; let's focus on the Device section:

&lt;p&gt;&lt;/p&gt;&lt;pre class="code"&gt;
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               9.80     0.20   36.60    0.40  5326.40     4.80   144.09     0.06    1.62   1.41   5.20
&lt;/pre&gt;

There are 3 easy ways to tell if a disk is a probable bottleneck here, and none of them show up without the -x flag, so get in the habit of using that.  "avgqu-sz" is the size of the io request queue; if it is large, there are lots of requests waiting in line.  "await" is how long (in ms) the average request took to be satisfied (including time enqueued); recall that on non-SSDs, a single seek is between 5 and 10ms.  Finally, "%util" is Linux's guess at how fully saturated the device is.

&lt;p&gt;
&lt;/p&gt;&lt;div style="font-size: 130%;"&gt;&lt;b&gt;top&lt;/b&gt;&lt;/div&gt;
To learn more about per-process CPU and memory usage, use "top."  I won't paste top output here because everyone is so familiar with it, but I will mention a few useful things to know:
&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;"P" and "M" toggle between sorting by cpu usage and sorting by memory usage&lt;/li&gt;&lt;li&gt;"1" toggles breaking down the CPU summary by CPU core&lt;/li&gt;&lt;li&gt;SHR (shared memory) is included in RES (resident memory)&lt;/li&gt;&lt;li&gt;Amount of memory belonging to a process that has been swapped out is VIRT - RES&lt;/li&gt;&lt;li&gt;a state (S column) of D means the process (or thread, see below) is waiting for disk or network i/o
&lt;/li&gt;&lt;li&gt;"steal" is how much CPU the hypervisor is giving to another VM in a virtual environment; as virtual provisioning becomes more common, &lt;a href="http://alan.blog-city.com/has_amazon_ec2_become_over_subscribed.htm"&gt;avoiding noisy neighbors&lt;/a&gt; is increasingly important&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;
"top -H" will split out individual threads into their own lines; both per-process and per-thread views are useful.  The per-thread view is particularly useful when dealing with Java applications since you can easily &lt;a href="http://publib.boulder.ibm.com/infocenter/javasdk/tools/index.jsp?topic=/com.ibm.java.doc.igaa/_1vg0001475cb4a-1190e2e0f74-8000_1007.html"&gt;correlate them with thread names from the JVM&lt;/a&gt; to see which threads are consuming your CPU. Briefly, you take the PID (thread ID) from top, convert it to hex -- e.g., "python -c 'print hex(12345)'" -- and match it with the corresponding thread ID from jstack.&lt;/p&gt;&lt;p&gt;Now you can troubleshoot with a process like: "Am I swapping? If so, what processes are using all the memory?  If my application makes a lot of disk read requests, are my reads being cached or are they actually hitting the disk?  If I am hitting the disk, is it saturated?  How much 'hot data' can I have before I run out of cache room?  Are any/all of my cpu cores maxed?  Which threads are actually using the CPU?  Which threads spend most of their time waiting for i/o?"  Then if you go to ask for help tuning something, you can &lt;a href="http://www.catb.org/%7Eesr/faqs/smart-questions.html"&gt;show that you've done your homework&lt;/a&gt;.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3952048496501613799?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3952048496501613799/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3952048496501613799' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3952048496501613799'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3952048496501613799'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2010/01/linux-performance-basics.html' title='Linux performance basics'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3155695943836005497</id><published>2009-12-15T14:01:00.001-08:00</published><updated>2010-01-19T05:49:59.849-08:00</updated><title type='text'>Cassandra reading list</title><content type='html'>I put together this list for a co-worker who wants to learn more about Cassandra: (&lt;a href="https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5/CHANGES.txt"&gt;0.5&lt;/a&gt; beta 2 out now!)&lt;ul&gt;&lt;li&gt;&lt;a href="http://wiki.apache.org/cassandra/GettingStarted"&gt;Getting Started&lt;/a&gt;: Cassandra is surprisingly easy to try out.  This walks you through both single-node and clustered setup.
&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html"&gt;The Dynamo paper&lt;/a&gt; and &lt;a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html"&gt;Amazon's related article on eventual consistency&lt;/a&gt;: Cassandra's replication model is strongly influenced by Dynamo's.  Almost everything you read here also applies to Cassandra.  (The major exceptions are vector clocks, and even that &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-580"&gt;may change&lt;/a&gt;, and Cassandra's&lt;a href="http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html"&gt; support for order-preserving partitioning&lt;/a&gt; with active load balancing.)&lt;/li&gt;&lt;li&gt;&lt;a href="http://arin.me/code/wtf-is-a-supercolumn-cassandra-data-model"&gt;WTF is a SuperColumn&lt;/a&gt;? Arin Sarkissian from Digg explains the Cassandra data model.&lt;/li&gt;&lt;li&gt;&lt;a href="http://wiki.apache.org/cassandra/Operations"&gt;Operations&lt;/a&gt;: stuff you will want to know when you run Cassandra in production&lt;/li&gt;&lt;li&gt;&lt;a href="http://n2.nabble.com/Cassandra-users-survey-td4040068.html"&gt;Cassandra users survey from Nov 09&lt;/a&gt;: What Twitter, Mahalo, Ooyala, SimpleGeo, and others are using Cassandra for
&lt;/li&gt;&lt;li&gt;&lt;a href="http://wiki.apache.org/cassandra/ArticlesAndPresentations"&gt;More articles here&lt;/a&gt; (Cassandra on OS X seems to be a particularly popular topic)
&lt;/li&gt;&lt;/ul&gt;If you want to know more about the internals, also see these:
&lt;ul&gt;&lt;li&gt;&lt;a href="http://wiki.apache.org/cassandra/ArchitectureInternals"&gt;Internals documentation&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.facebook.com/video/video.php?v=540974400803"&gt;Facebook presentation&lt;/a&gt; and &lt;a href="http://vimeo.com/5185526"&gt;NoSQL SF presentation&lt;/a&gt;, by Avinash Lakshman (the second picks up almost where the first leaves off)&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf"&gt;LADIS 2009 paper&lt;/a&gt; by Avinash Lakshman and Prashant Malik
&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3155695943836005497?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3155695943836005497/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3155695943836005497' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3155695943836005497'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3155695943836005497'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/12/cassandra-reading-list.html' title='Cassandra reading list'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2342321015427428420</id><published>2009-07-29T10:34:00.000-07:00</published><updated>2009-11-02T05:36:16.641-08:00</updated><title type='text'>Cassandra hackfest and OSCON report</title><content type='html'>The best part of OSCON for me wasn't actually part of OSCON.  The guys at Twitter put together a &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt; &lt;a href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3Cb6f68fc60907161612t5469a76ds6175f846ce29a05a@mail.gmail.com%3E"&gt;hackfest&lt;/a&gt; on Wednesday night, with &lt;a href="http://www.flickr.com/photos/_evan/3751880113/in/set-72157621750026309/"&gt;much awesomeness&lt;/a&gt; resulting.  Thanks to &lt;a href="http://twitter.com/evan"&gt;Evan&lt;/a&gt; for organizing!
&lt;p&gt;
&lt;a href="http://twitter.com/stuhood"&gt;Stu Hood&lt;/a&gt; flew up from Rackspace's Virginia offices just for the night, which normally probably wouldn't have been worth it, but &lt;a href="http://twitter.com/moonpolysoft"&gt;Cliff Moon&lt;/a&gt;, author of &lt;a href="http://github.com/cliffmoon/dynomite/tree/master"&gt;dynomite&lt;/a&gt;, showed up (thanks, Cliff!) and was able to give Stu a lot of pointers on &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-193"&gt;implementing merkle trees&lt;/a&gt;.  Cliff and I also had a good discussion with Jun Rao about hinted handoff--Cliff and Jun are not fans, and I tend to agree with them--and &lt;a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html"&gt;eventual consistency&lt;/a&gt;.
&lt;p&gt;
I also met &lt;a href="http://blog.lostlake.org/"&gt;David Pollack&lt;/a&gt; and got to talk a little about persistence for &lt;a href="http://blog.lostlake.org/index.php?/archives/94-Lift,-Goat-Rodeo-and-Such.html"&gt;Goat Rodeo&lt;/a&gt;, and talked to a ton of people from Twitter and Digg. I think those two, with Rackspace and IBM Research, constituted the companies with more than one engineer attending.  The rest was "long tail."
&lt;p&gt;
Back at OSCON, my Cassandra talk was standing room only.  Slides:
&lt;div style="width: 425px; text-align: left;" id="__ss_1786870"&gt;&lt;a style="margin: 12px 0pt 3px; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; display: block; text-decoration: underline;" href="http://www.slideshare.net/jbellis/cassandra-open-source-bigtable-dynamo" title="Cassandra: Open Source Bigtable + Dynamo"&gt;Cassandra: Open Source Bigtable + Dynamo&lt;/a&gt;&lt;object style="margin: 0px;" width="425" height="355"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandraopensourcebigtabledynamopresentation-090729134121-phpapp01&amp;amp;stripped_title=cassandra-open-source-bigtable-dynamo"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandraopensourcebigtabledynamopresentation-090729134121-phpapp01&amp;amp;stripped_title=cassandra-open-source-bigtable-dynamo" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"&gt;View more &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/"&gt;documents&lt;/a&gt; from &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/jbellis"&gt;jbellis&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;

My second talk is the one I would have preferred to give first, on "What Every Developer Should Know About Database Scalability".  (I would have preferred to give it first so that I could have just said "come to my Cassandra talk for more details" instead of trying to cram that in at the end.  But, it was in my proposal outline!) Slides: &lt;div style="width: 425px; text-align: left;" id="__ss_1786869"&gt;&lt;a style="margin: 12px 0pt 3px; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; display: block; text-decoration: underline;" href="http://www.slideshare.net/jbellis/what-every-developer-should-know-about-database-scalability" title="What Every Developer Should Know About Database Scalability"&gt;What Every Developer Should Know About Database Scalability&lt;/a&gt;&lt;object style="margin: 0px;" width="425" height="355"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=scalingdatabases-090729134110-phpapp01&amp;amp;stripped_title=what-every-developer-should-know-about-database-scalability"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=scalingdatabases-090729134110-phpapp01&amp;amp;stripped_title=what-every-developer-should-know-about-database-scalability" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"&gt;View more &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/"&gt;documents&lt;/a&gt; from &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/jbellis"&gt;jbellis&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;

Other OSCON talks I liked (that have slides available):
&lt;ul&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/oscon2009/public/schedule/detail/8198"&gt;Gearman: Bringing the Power of Map/Reduce to Everyday Applications&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/oscon2009/public/schedule/detail/8230"&gt;High Performance SQL with PostgreSQL [8.4]
&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/oscon2009/public/schedule/detail/8432"&gt;Linux Filesystem Performance for Databases&lt;/a&gt; (reiserfs blows everyone away for random writes, by a factor of &gt; 2!?)&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/oscon2009/public/schedule/detail/8364"&gt;Neo4j - The Benefits of Graph Databases&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.oreilly.com/oscon2009/public/schedule/detail/7823"&gt;Release Mismanagement: How to Alienate Users and Frustrate Developers&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2342321015427428420?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2342321015427428420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2342321015427428420' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2342321015427428420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2342321015427428420'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/07/cassandra-et-al-at-oscon.html' title='Cassandra hackfest and OSCON report'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2087158412999690976</id><published>2009-07-06T10:56:00.000-07:00</published><updated>2009-07-06T11:20:33.833-07:00</updated><title type='text'>Cassandra 0.3 update</title><content type='html'>&lt;p&gt;
Two months after &lt;a href="http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html"&gt;the first release candidate&lt;/a&gt;, &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt; 0.3 is still not out.  But, we're close!
&lt;p&gt;
We had two more bug-fix release candidates, and it's virtually certain that 0.3-final will be the same exact code as &lt;a href="http://people.apache.org/%7Ejbellis/cassandra/cassandra-0.3.0-rc3.tar.gz"&gt;0.3-rc3&lt;/a&gt;.  (If you're using rc1, you do want to upgrade; see &lt;a href="https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-rc3/CHANGES.txt"&gt;CHANGES.txt&lt;/a&gt;.)  But, we got stuck in the &lt;a href="http://twitter.com/spyced/status/2497990811"&gt;ASF bureaucracy&lt;/a&gt; and it's going to take at least &lt;a href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%3Ce06563880907060804p731964d5k6cb6d7ab73d92767@mail.gmail.com%3E"&gt;one more round-trip&lt;/a&gt; before the crack Release Prevention Team grudgingly lets us call it official.
&lt;p&gt;
In the meantime, &lt;a href="https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&amp;amp;pid=12310865&amp;amp;status=5"&gt;work continues apace&lt;/a&gt; on trunk for 0.4.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2087158412999690976?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2087158412999690976/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2087158412999690976' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2087158412999690976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2087158412999690976'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/07/cassandra-03-update.html' title='Cassandra 0.3 update'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2114870755095467671</id><published>2009-06-23T09:40:00.000-07:00</published><updated>2009-06-23T11:07:36.548-07:00</updated><title type='text'>Patch-oriented development made sane with git-svn</title><content type='html'>One of the drawbacks to working on &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt; is that unlike every other OSS project I have ever worked on, we are using a patch-oriented development process rather than post-commit review.  It's really quite painfully slow.  Somehow this became sort of the default for ASF projects, but &lt;a href="http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg00244.html"&gt;there is precedent&lt;/a&gt; for switching to post-commit review eventually.
&lt;p&gt;
In the meantime, there is git-svn.
&lt;/p&gt;&lt;p&gt;
(The ASF does have &lt;a href="http://wiki.apache.org/general/GitAtApache"&gt;a git mirror set up&lt;/a&gt;, but I'm going to ignore that because (a) its reliability has been questionable and (b) sticking with git-svn hopefully makes this more useful for non-ASF projects.)
&lt;/p&gt;
&lt;p&gt;
Disclaimer: &lt;a href="http://spyced.blogspot.com/2008/12/frustrated-with-git.html"&gt;
I am not a git expert&lt;/a&gt;, and probably some of this will make you cringe if you are.  Still, I hope it will be useful for some others fumbling their way towards enlightenment.  As background, I suggest the &lt;a href="http://git.or.cz/course/svn.html"&gt;git crash course for svn users&lt;/a&gt;.  Just the parts up to the Remote section.
&lt;/p&gt;
&lt;p&gt;
Checkout:
&lt;/p&gt;&lt;ol&gt;
 &lt;li&gt;git-svn init https://svn.apache.org/repos/asf/cassandra/trunk cassandra
&lt;/li&gt;&lt;/ol&gt;

Once that's done the only git-svn commands you need to know about are dcommit to push the changes &lt;i&gt;in the current git branch&lt;/i&gt; back to svn, and rebase, to pull changes from svn and re-apply your uncommitted patches on top of that (basically exactly like svn up).
&lt;p&gt;
Creating new code:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;git checkout -b [ticket number]
&lt;/li&gt;&lt;li&gt;[edit stuff, maybe get add or git rm new or obsolete files]
&lt;/li&gt;&lt;li&gt;git commit -a -m 'commit'
&lt;/li&gt;&lt;li&gt;repeat 2-3 as necessary
&lt;/li&gt;&lt;li&gt;&lt;a href="http://github.com/dreiss/git-jira-attacher/tree/master"&gt;git-jira-attacher&lt;/a&gt; [revision] (usually some variant of HEAD^^^)
&lt;/ol&gt;
[after review]
&lt;ol&gt;
&lt;/li&gt;
&lt;li&gt;git log &lt;i&gt;(just to make sure I'm about to commit what I think I'm about to commit)&lt;/i&gt;
&lt;li&gt;git-svn dcommit
&lt;/li&gt;&lt;li&gt;git checkout master
&lt;/li&gt;&lt;li&gt;git-svn rebase -l &lt;i&gt;(this will put the changes you just committed into master)&lt;/i&gt;
&lt;/li&gt;&lt;li&gt;git branch -d [ticket number]
&lt;/li&gt;&lt;/ol&gt;
When I'm reviewing code it looks similar:
&lt;ol&gt;
&lt;li&gt;git checkout -b [ticket number]
&lt;/li&gt;&lt;li&gt;wget patches and git-apply, or &lt;a href="http://github.com/eevans/git-jira-attacher/blob/b002ab0a0cd4d7c9f3801df4e8664e9fc3711053/jira-apply"&gt;jira-apply&lt;/a&gt; CASSANDRA-[ticket-number]
&lt;/li&gt;&lt;li&gt;review in gitk/qgit and/or IDE (the intellij git plugin is quite decent)
&lt;/li&gt;&lt;li&gt;commit .. branch -d as above
&lt;/li&gt;&lt;/ol&gt;

The last operation is "see who I need to bug to get reviews moving."  This is just a list of the branches I haven't merged into master and deleted yet:
&lt;ol&gt;
&lt;li&gt;git branch
&lt;/li&gt;&lt;/ol&gt;

Git-svn takes a lot of the pain out of the ASF's patch-and-jira workflow.  In particular, you can easily break changes for a ticket up into multiple patches that are easily reviewed, and the latency of waiting for patch review doesn't kill your throughput so badly since you can just leave that branch alone and start a new one for your next piece of functionality.  And of course you get git commit --amend and git rebase -i for massaging patches during the review process.
&lt;p&gt;
One fairly common complication is if you finish a ticket A, then start on ticket B (that depends on A) while waiting for A to be reviewed.  So you checkout -b from your branch A rather than master and build some patches on that.  As sometimes happens, the reviewer finds something you need to improve in your patch set for A, so you make those changes.  Now you need to rebase your patches to B on top of the changes you made to A.  The best way to do this is to branch A to B-2, then git cherry-pick from B and resolve conflicts as necessary.
&lt;/p&gt;&lt;p&gt;
Final note: I often like to create lots of small commits as I am exploring a solution and combine them into larger units with git rebase -i for patch submission.  (It's easier to combine small patches, than pull apart large ones.)  So my early commit messages are often terse and need editing.  You can change commit messages with edit mode in rebase, then using commit --amend and rebase --continue, but that is tedious.  I complained about this to my friend &lt;a href="http://twitter.com/zwily"&gt;Zach Wily&lt;/a&gt; and he made this git amend-message command (place in [alias] in your .gitconfig):

&lt;/p&gt;&lt;pre&gt;
   amend-message = "!bash -c ' \
       c=$0; \
       if [ $c == \"bash\" ]; then echo \"Usage: git amend-message &amp;lt;commit&amp;gt;\"; exit 1; fi; \
       saved_head=$(git rev-parse HEAD); \
       commit=$(git rev-parse $c); \
       commits=$(git log --reverse --pretty=format:%H $commit..HEAD); \
       echo \"Rewinding to $commit...\"; \
       git reset --hard $commit; \
       git commit --amend; \
       for X in $commits; do \
           echo \"Applying $X...\"; \
           git cherry-pick $X &gt;&gt; /dev/null; \
           if [ $? -ne 0 ]; then \
               echo \"  apply failed (is this a merge?), rolling back all changes\"; \
               git reset --hard $saved_head; \
               echo \" ** AMEND-MESSAGE FAILED, sorry\"; \
               exit 1; \
           fi; \
       done; \
       echo \"Done\"'"
&lt;/pre&gt;

(Zach would like the record to show that he knows this is pretty hacky. "For instance, it won't work if one of the commits after the one you're changing is a merge, since cherry-pick can't handle those."  But it's quite useful, all the same.)
&lt;p&gt;
For what it's worth, the rest of my aliases are
&lt;/p&gt;&lt;pre&gt;
 st = status
 ci = commit
 co = checkout
 br = branch
 cp = cherry-pick
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2114870755095467671?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2114870755095467671/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2114870755095467671' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2114870755095467671'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2114870755095467671'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/06/patch-oriented-development-made-sane.html' title='Patch-oriented development made sane with git-svn'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7256699447414724206</id><published>2009-05-27T06:59:00.000-07:00</published><updated>2009-05-27T08:21:14.068-07:00</updated><title type='text'>Why you won't be building your killer app on a distributed hash table</title><content type='html'>I ran across &lt;a href="http://seattleweb.intel-research.net/people/lamarca/pubs/paper-ChaRam.pdf"&gt;A case study in building layered DHT applications&lt;/a&gt; while doing some research on &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-192"&gt;implementing load-balancing in Cassandra&lt;/a&gt;.  The question addressed is, "Are DHTs a general-purpose tool that you can build more sophisticated services on?"
&lt;p&gt;
Short version: no.  A few specialized applications can and have been built on a plain DHT, but most applications built on DHTs have ended up having to customize the DHT's internals to achieve their functional or performance goals.
&lt;p&gt;
This paper describes the results of attempting to build a relatively complex datastructure (prefix hash trees, for range queries) on top of OpenDHT.  The result was mostly failure:
&lt;blockquote style="font-style: italic;"&gt;A simple put-get interface was not quite enough. In particular, OpenDHT relies on timeouts to invalidate entries and has no support for atomicity primitives... In return for ease of implementation and deployment, we sacrificed performance. With the OpenDHT implementation, a PHT query operation took a median of 2–4 seconds. This is due to the fact that layering entirely on top of a DHT service inherently implies that applications must perform a sequence of put-get operations to implement higher level semantics with limited opportunity for optimization within the DHT.&lt;/blockquote&gt;In other words, there are two primary problems with the DHT approach:
&lt;ul&gt;&lt;li&gt;Most DHTs will require a second locking layer to achieve correctness when implementing a more complex data structure on top of the DHT semantics.  In particular, this will certainly apply to eventually-consistent systems in the Dynamo mold.&lt;/li&gt;&lt;li&gt;Advanced functionality like range queries needs to be supported natively to be at all efficient.
&lt;/li&gt;&lt;/ul&gt;While they spin this in a positive manner -- "hey, at least it didn't take much code&lt;span style="font-style: italic;"&gt;&lt;/span&gt;" -- the reality is that for most of us, query latency of two to four seconds is several orders of magnitude away from acceptable.
&lt;p&gt;
This is one reason why I think &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt; is the most promising of the open-source distributed databases -- you get a relatively rich data model and a distribution model that supports efficient range queries.  These are not things that can be grafted on top of a simpler DHT foundation, so Cassandra will be useful for a wider variety of applications.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7256699447414724206?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7256699447414724206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7256699447414724206' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7256699447414724206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7256699447414724206'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/05/why-you-wont-be-building-your-killer.html' title='Why you won&apos;t be building your killer app on a distributed hash table'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5049772954319479599</id><published>2009-05-18T15:19:00.001-07:00</published><updated>2009-05-18T15:35:20.746-07:00</updated><title type='text'>Belated 2009 Introduction to SQLAlchemy slides</title><content type='html'>I was asked to put my slides up again -- sorry it took so long.  The slides and code samples are now up &lt;a href="http://people.apache.org/%7Ejbellis/sqla2009/"&gt;here&lt;/a&gt;.  Video of the tutorial &lt;a href="http://blip.tv/file/1998818"&gt;is also up&lt;/a&gt;.  (3 parts, first is linked).  There's definitely audio problems in parts but at least some is watchable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5049772954319479599?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5049772954319479599/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5049772954319479599' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5049772954319479599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5049772954319479599'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/05/belated-2009-introduction-to-sqlalchemy.html' title='Belated 2009 Introduction to SQLAlchemy slides'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6219664989087220916</id><published>2009-05-13T20:18:00.000-07:00</published><updated>2010-07-29T21:34:16.724-07:00</updated><title type='text'>Cassandra 0.3 release candidate and progress</title><content type='html'>We have a release candidate out for &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt; 0.3.  Grab the &lt;a href="http://people.apache.org/%7Ejbellis/cassandra/cassandra-0.3-rc.tgz"&gt;download&lt;/a&gt; and check out &lt;a href="http://wiki.apache.org/cassandra/GettingStarted"&gt;how to get started&lt;/a&gt;.  The &lt;a href="http://www.facebook.com/video/video.php?v=540974400803"&gt;facebook presentation&lt;/a&gt; from almost a year ago now is also still a good intro to some of the features and data model.
&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;Cassandra in a nutshell&lt;/span&gt;:
&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Scales writes very, very well: just add more nodes!&lt;/li&gt;&lt;li&gt;Has a much richer data model than vanilla key/value stores -- closer to what you'd be used to in a relational db.&lt;/li&gt;&lt;li&gt;Is pretty bleeding edge -- to my knowledge, Facebook is the only group running Cassandra in production.  (Their largest cluster is &lt;a href="http://groups.google.com/group/cassandra-user/msg/85a83621d07ff165"&gt;120 machines and 40TB of data&lt;/a&gt;.)  At Rackspace we are working on a Cassandra-based app now that 0.3 has the extra features we need.&lt;/li&gt;&lt;li&gt;Moved to the Apache Incubator about 40 days ago, at which point development greatly accelerated.
&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Changes in 0.3 include&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;Range queries on keys, including user-defined key collation.&lt;/li&gt;&lt;li&gt;Remove support, which is nontrivial in an eventually consistent world.
&lt;/li&gt;&lt;li&gt;Workaround for a weird bug in JDK select/register that seems particularly common on VM environments.  Cassandra should deploy fine on EC2 now.  (Oddly, it never had problems on Slicehost / Cloud Servers, which is also Xen-based.)&lt;/li&gt;&lt;li&gt;Much improved infrastructure: the beginnings of a decent test suite ("ant test" for unit tests; "nosetests" for system tests), code coverage reporting, etc.&lt;/li&gt;&lt;li&gt;Expanded node status reporting via JMX
&lt;/li&gt;&lt;li&gt;Improved error reporting/logging on both server and client
&lt;/li&gt;&lt;li&gt;Reduced memory footprint in default configuration&lt;/li&gt;&lt;li&gt;and plenty of bug fixes.&lt;/li&gt;&lt;/ul&gt;For those of you just joining us, Cassandra already had
&lt;ul&gt;&lt;li&gt;An advanced on-disk storage engine that never does random writes&lt;/li&gt;&lt;li&gt;Transaction log-based data integrity&lt;/li&gt;&lt;li&gt;P2P gossip failure detection
&lt;/li&gt;&lt;li&gt;Read repair&lt;/li&gt;&lt;li&gt;Hinted handoff&lt;/li&gt;&lt;li&gt;Bootstrap (adding new nodes to a running cluster)&lt;/li&gt;&lt;/ul&gt;(Read repair and hinted handoff are discussed in more detail in the &lt;a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf"&gt;Dynamo paper&lt;/a&gt;.)
&lt;p&gt;

The cassandra development and user community is also growing at an exciting pace.  Besides the original two developers from Facebook, we now have five developers regularly contributing improvements and fixes, and many others on a more ad-hoc basis.

&lt;/p&gt;&lt;p&gt;
&lt;span style="font-weight: bold;"&gt;How fast is it?&lt;/span&gt;

&lt;/p&gt;&lt;p&gt;
In a nutshell, Cassandra is much faster than relational databases, and much slower than memory-only systems or systems that don't sync each update to disk.  Actual benchmarks are &lt;a href="http://blog.oskarsson.nu/2009/05/vpork.html"&gt;in the works&lt;/a&gt;.  We plan to start performance tuning with the next release, but if you want to benchmark it, here are some suggestions to get numbers closer to what you'll see in the wild (and about 10x more throughput than if you don't do these):
&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Do enough runs of your benchmark first that each operation tested by your suite runs 20k times before timing it for real.  This will allow the JVM jit to compile down to machine code; otherwise you'll just be getting the interpreted version.&lt;/li&gt;&lt;li&gt;Change the root logger level in conf/log4j.properties from DEBUG to INFO; we do a LOT of logging for debuggability and for small column values the logging has more overhead than the actual workload. (It would be even faster if we were to &lt;a href="http://surguy.net/articles/removing-log-messages.xml"&gt;remove them entirely&lt;/a&gt; but that didn't make this release.)&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6219664989087220916?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6219664989087220916/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6219664989087220916' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6219664989087220916'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6219664989087220916'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html' title='Cassandra 0.3 release candidate and progress'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6732010163099572250</id><published>2009-05-01T13:54:00.000-07:00</published><updated>2009-05-04T17:00:46.230-07:00</updated><title type='text'>A better analysis of Cassandra than most</title><content type='html'>Vladimir Sedach wrote a three-part dive into &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt;.  (Almost two months ago now.  Guess I need to set up Google Alerts.  Trouble is there's a surprising amount of noise around the word `cassandra.`)
&lt;ul&gt;&lt;li&gt;&lt;a href="http://carcaddar.blogspot.com/2009/03/cassandra-of-facebook-or-tale-of.html"&gt;Part 0
&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://carcaddar.blogspot.com/2009/03/cassandra-of-facebook-or-tale-of_10.html"&gt;Part 1&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="http://carcaddar.blogspot.com/2009/03/cassandra-of-facebook-or-tale-of_1895.html"&gt;Part 2&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;A few notes:
&lt;ul&gt;&lt;li&gt;We now have an &lt;a href="http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html"&gt;order-preserving partitioner&lt;/a&gt; as well as the hash-based one&lt;/li&gt;&lt;li&gt;Yes, if you tell Cassandra to wait for all replicas to be ack'd before calling a write a success, then you would have traditional consistency (as opposed to "eventual") but you'd also have no tolerance for hardware failures which is a main point of this kind of system.&lt;/li&gt;&lt;li&gt;Zookeeper is not currently used by Cassandra, although we have plans to use it in the future.
&lt;/li&gt;&lt;li&gt;Load balancing is not implemented yet.
&lt;/li&gt;&lt;li&gt;The move to Apache is &lt;a href="http://incubator.apache.org/cassandra/"&gt;finished&lt;/a&gt; and development is active there now.
&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6732010163099572250?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6732010163099572250/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6732010163099572250' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6732010163099572250'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6732010163099572250'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/05/better-analysis-of-cassandra-than-most.html' title='A better analysis of Cassandra than most'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7335993842681149975</id><published>2009-05-01T07:44:00.000-07:00</published><updated>2010-10-11T20:54:44.189-07:00</updated><title type='text'>Consistent hashing vs order-preserving partitioning in distributed databases</title><content type='html'>&lt;p&gt;
The &lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra&lt;/a&gt; distributed database supports two partitioning schemes now: the traditional &lt;a href="http://en.wikipedia.org/wiki/Consistent_hashing"&gt;consistent hashing&lt;/a&gt; scheme, and an order-preserving partitioner.
&lt;/p&gt;&lt;p&gt;
The reason that almost all similar systems use consistent hashing (the &lt;a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html"&gt;dynamo paper&lt;/a&gt; has the best description; see sections 4.1-4.3) is that it provides a kind of brain-dead load balancing from the hash algorithm spreading keys across the ring.  But the dynamo authors go into some detail about how this by itself doesn't actually give good results in practice; their solution was to assign multiple tokens to each node in the cluster and they describe several approaches to that.  But Cassandra's original designer considers this &lt;a href="http://groups.google.com/group/cassandra-user/msg/b0e9eed9116f0337"&gt;a hack&lt;/a&gt; and prefers &lt;a href="http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41"&gt;real load balancing&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
An order-preserving partitioner, where keys are distributed to nodes in their natural order, has huge advantages over consistent hashing, particularly the ability to do range queries across the keys in the system (which has also been committed to Cassandra now).  This is important, because the corollary of "the partitioner uses the key to determine what node the data is on" is, "each key should only have an amount of data associated with it (see the &lt;a href="http://cwiki.apache.org/confluence/display/CSDR/Data+Model"&gt;data model explanation&lt;/a&gt;) that is relatively small compared to a node's capacity."  Cassandra column families will often have more columns in them than you'd see in a traditional database, but "millions" is pushing it (depending on column size) and "billions" is a bad idea.  So you'll want to model things such that you spread data across multiple keys and if you then pick an appropriate key naming convention, range queries will let you slice and dice that data as needed.
&lt;/p&gt;&lt;p&gt;
Cassandra is in the process of implementing load balancing still, but in the meantime order-preserving partitioning is still be useful without that &lt;span style="font-style: italic;"&gt;if&lt;/span&gt; you know what your key distribution will look like in advance and can pick your node tokens accordingly.  Otherwise, there's always the old-school hash-based partitioner until we get that done (for the release after the one we'll have in the next week or so).
&lt;/p&gt;&lt;p&gt;
See the &lt;a href="http://cwiki.apache.org/confluence/display/CSDR/Index"&gt;introduction&lt;/a&gt; and &lt;a href="http://cwiki.apache.org/confluence/display/CSDR/GettingStarted"&gt;getting started&lt;/a&gt; pages of the Cassandra wiki for more on Cassandra, and drop us a line on the mailing list or in IRC if you have questions; we're actively trying to improve our docs.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7335993842681149975?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7335993842681149975/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7335993842681149975' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7335993842681149975'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7335993842681149975'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html' title='Consistent hashing vs order-preserving partitioning in distributed databases'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5448487737287597931</id><published>2009-04-30T14:58:00.000-07:00</published><updated>2009-11-20T05:29:06.176-08:00</updated><title type='text'>Automatic project structure inference</title><content type='html'>&lt;p&gt;
David MacIver has an interesting blog entry up about &lt;a href="http://www.drmaciver.com/2009/04/determining-logical-project-structure-from-commit-logs"&gt;determining logical project structure via commit logs&lt;/a&gt;.  I was very interested because &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-27"&gt;one of Cassandra's oldest issues&lt;/a&gt; is creating categories for our JIRA instance.  (I've never been a big fan of JIRA, but you work with the tools you have.  Or the ones the ASF inflicts on you, in this case.)
&lt;p&gt;
The desire to add extra work to issue reporting for a young project like Cassandra strikes me as slightly misguided in the first place.  I have what may be an excessive aversion to overengineering, and I like to see a very clear benefit before adding complexity to anything, even an issue tracker.  Still, I was curious to see what David's clustering algorithm made of things.  And after pestering him to show me how to run his code I figure I owe it to him to &lt;a href="http://people.apache.org/%7Ejbellis/maciver-clusters.txt"&gt;show my results&lt;/a&gt;.
&lt;p&gt;
In general it did a pretty good job, particularly with the mid-sized groups of files.  The large groups are just noise; the small groups, well, it's not exactly a revelation that Filter and FilterTest go together.  I'd be tempted to play with it more but with only about two months and 250 commits in the apache repo there's not really all that much data there.  (Cassandra's first two years were in an internal Facebook repository.)  Working with data that exists as a side effect of natural activity is fascinating.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5448487737287597931?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5448487737287597931/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5448487737287597931' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5448487737287597931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5448487737287597931'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/04/automatic-project-structure-inference.html' title='Automatic project structure inference'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2720699300917516840</id><published>2009-04-11T06:52:00.000-07:00</published><updated>2011-11-04T20:26:33.846-07:00</updated><title type='text'>The best PyCon talk you didn't see</title><content type='html'>There were a lot of good talks at PyCon but I humbly submit that the best one you haven't seen yet is Robert Brewer's &lt;a href="http://us.pycon.org/2009/conference/schedule/event/70/"&gt;talk on DejaVu&lt;/a&gt;.  Robert describes how his &lt;a href="http://www.aminus.net/geniusql/chrome/common/doc/trunk/"&gt;Geniusql&lt;/a&gt; layer &lt;span style="font-style: italic;"&gt;disassembles and parses python bytecode&lt;/span&gt; to let his ORM turn python lambdas into SQL.  Microsoft got a lot of press for doing something similar for .NET with &lt;a href="http://msdn.microsoft.com/en-us/netframework/aa904594.aspx"&gt;LINQ&lt;/a&gt;, but Bob was there first.
&lt;pre&gt;  box = store.new_sandbox()
print [c.Title for c in box.recall(
  Comic, lambda c: 'Hob' in c.Title or c.Views &amp;gt; 0)]
&lt;/pre&gt;This is cool as hell.  The Geniusql part start about 15 minutes in.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2720699300917516840?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2720699300917516840/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2720699300917516840' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2720699300917516840'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2720699300917516840'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/04/best-pycon-talk-you-didnt-see.html' title='The best PyCon talk you didn&apos;t see'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6358465133207864267</id><published>2009-04-07T07:57:00.000-07:00</published><updated>2009-04-07T08:03:02.030-07:00</updated><title type='text'>Credit where credit is due</title><content type='html'>I'm starting to conclude that &lt;a href="http://spyced.blogspot.com/2008/12/frustrated-with-git.html"&gt;git just doesn't fit my brain&lt;/a&gt;.  Several months in, I'm still confused when things don't work the way they "should."  My co-worker says I should start a wiki for weird-ass things to do with git: "You keep coming up with use cases that would never occur to me."

But, I have to give the git community credit: I've never gone in to #git on freenode and gotten less than fantastic help.  Even with git-svn.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6358465133207864267?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6358465133207864267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6358465133207864267' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6358465133207864267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6358465133207864267'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/04/credit-where-credit-is-due.html' title='Credit where credit is due'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2607311535839328933</id><published>2009-03-28T08:29:00.001-07:00</published><updated>2009-03-28T08:33:33.891-07:00</updated><title type='text'>Distributed Databases and Cassandra at PyCon</title><content type='html'>I'll be leading an &lt;a href="http://us.pycon.org/2009/openspace/DistributedDatabases/"&gt;open-spaces discussion&lt;/a&gt; about distributed database architecture, implementation, and use today at 5:00 PM in the Lambert room.  Specifically, we will cover bigtable, dynamo, and cassandra, and how to port a typical relational schema to cassandra's ColumnFamily model.
&lt;p&gt;
I wrote a little background information yesterday about why I think &lt;a href="http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html"&gt;Cassandra in particular is compelling&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2607311535839328933?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2607311535839328933/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2607311535839328933' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2607311535839328933'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2607311535839328933'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/03/distributed-databases-and-cassandra-at.html' title='Distributed Databases and Cassandra at PyCon'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-9177532026427255940</id><published>2009-03-27T06:17:00.000-07:00</published><updated>2010-10-12T12:22:41.803-07:00</updated><title type='text'>Why I like the Cassandra distributed database</title><content type='html'>I need a distributed database.  A &lt;span style="font-style: italic;"&gt;real&lt;/span&gt; distributed database; &lt;a href="http://spyced.blogspot.com/2008/12/couchdb-not-drinking-kool-aid.html"&gt;replication doesn't count&lt;/a&gt; because under a replication-oriented db, each node still needs to be able to handle the full write volume, and you can only throw hardware at that for so long.
&lt;p&gt;
So, I'm working on the Cassandra distributed database.  I gave a lightning talk on it at PyCon this morning. Cassandra is written in Java and implements a sort of hybrid between &lt;a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html"&gt;Dynamo&lt;/a&gt; and &lt;a href="http://labs.google.com/papers/bigtable.html"&gt;Bigtable&lt;/a&gt;.  (Both papers are worth reading.)  It takes its distribution algorithm from Dynamo and its data model from Bigtable -- sort of the best of both worlds.  Avinash Lakshman, Cassandra's architect, is one of the authors of the Dynamo paper.
&lt;/p&gt;&lt;p&gt;

There is a &lt;a href="http://www.new.facebook.com/video/video.php?v=540974400803"&gt;video about Cassandra here&lt;/a&gt;.  The first 1/4 is about using Cassandra and then the rest is mostly about the internals.
&lt;/p&gt;&lt;p&gt;

Cassandra is very bleeding edge.  Facebook runs several Cassandra clusters in production (the largest is &lt;a href="http://groups.google.com/group/cassandra-user/msg/85a83621d07ff165"&gt;120 machines and 40TB of data&lt;/a&gt;), but there &lt;span style="font-style: italic;"&gt;are&lt;/span&gt; sharp edges that will cut you.  If you want something that Just Works out of the box Cassandra is a poor fit right now and will be for several months.
&lt;/p&gt;&lt;p&gt;

Cassandra was open-sourced by Facebook last summer.  There &lt;a href="http://glinden.blogspot.com/2008/08/cassandra-data-store-at-facebook.html"&gt;was some initial buzz&lt;/a&gt; but the facebook developers had trouble dealing with the community and the project looked moribund -- FB was doing development against an internal repository and throwing code over the wall every few months which is no way to run an OSS project.  Now the code is being developed in the open on Apache and I was voted in as a committer so things are starting to move again.
&lt;/p&gt;&lt;p&gt;

There are other distributed databases that are worth considering.    Here is why those don't fit my needs (and I am not saying that these are &lt;span style="font-style: italic;"&gt;bad &lt;/span&gt;choices if your requirements are different, just that they don't work for me):

&lt;/p&gt;&lt;p&gt;
HBase:
&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Follows the bigtable model, so it's more complicated than it needs to be.  (300+kloc vs 50 for Cassandra; &lt;span style="font-style: italic;"&gt;many&lt;/span&gt; more components).  This means it's that much harder for me to troubleshoot.  HBase is more bug-free than Cassandra but not so bug-free that troubleshooting would not be required.&lt;/li&gt;&lt;li&gt;Does not have any non-java clients.  I need CPython support.
&lt;/li&gt;&lt;li&gt;Sits on top of HDFS, which is optimized for streaming reads, not random accesses.  So HBase is fine for batch processing but not so good for online apps.
&lt;/li&gt;&lt;/ul&gt;Hypertable:
&lt;ul&gt;&lt;li&gt;See HBase, except it's written in C++.
&lt;/li&gt;&lt;/ul&gt;Project Voldemort:
&lt;ul&gt;&lt;li&gt;Voldemort is probably the key / value store in the pattern of Dynamo that is farthest along.  If you only need key / value I would definitely recommend Voldemort.  Next closest is probably Dynomite.  Then there are a whole bunch of "me too" key value stores that made fatal &lt;a href="http://twitter.com/Werner/statuses/1008722501"&gt;architecture decisions&lt;/a&gt; or are writing a "memcached with persistence" without really thinking it through.
&lt;/li&gt;&lt;/ul&gt;
Running Cassandra [updated 04-22-09 for hacker news visitors]:
&lt;ol&gt;&lt;li&gt;prereqs: jdk6 and ant
&lt;/li&gt;&lt;li&gt;Check out the code from &lt;a href="http://svn.apache.org/repos/asf/incubator/cassandra/trunk"&gt;http://svn.apache.org/repos/asf/incubator/cassandra/trunk&lt;/a&gt; (did I mention this was early-adopter only?)&lt;/li&gt;&lt;li&gt;run ant [optional: ant test].    If you get an error like "class file has wrong version 50.0, should be 49.0" then ant is using an old jdk version instead of 6.
&lt;/li&gt;&lt;li&gt;For non-java clients, install &lt;a href="http://incubator.apache.org/thrift/download/"&gt;Thrift&lt;/a&gt;. (For java, trunk includes libthrift.jar.)  This is a major undertaking in its own right. See &lt;a href="http://radlab.cs.berkeley.edu/wiki/Projects/Running_Cassandra"&gt;this page&lt;/a&gt; for a list of dependencies, although most of the rest of that page is now outdated -- for instance, Cassandra no longer depends on the fb303 interface.  Python users will have to hand-edit the generated Cassandra.py in three obvious places until &lt;a href="https://issues.apache.org/jira/browse/THRIFT-339"&gt;this bug&lt;/a&gt; is fixed -- just replace the broken argument with None.
&lt;/li&gt;&lt;li&gt;run bin/cassandra [optionally -f for foreground]
&lt;/li&gt;&lt;li&gt;Connect to the server.  By default it listens on localhost:9160.  Look at config/server-conf.xml for the columnfamily definitions.&lt;/li&gt;&lt;li&gt;Insert and query some data.  &lt;a href="http://code.google.com/p/the-cassandra-project/wiki/ThriftInterface"&gt;Here is an introduction&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Ask on the mailing list or #cassandra on freenode if you have questions.&lt;/li&gt;&lt;/ol&gt;(&lt;a href="http://incubator.apache.org/cassandra/"&gt;Cassandra has a new website&lt;/a&gt; up to replace the google code one.  We're actively working on the docs, so let us know what needs work.)

Good luck!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-9177532026427255940?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/9177532026427255940/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=9177532026427255940' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/9177532026427255940'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/9177532026427255940'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html' title='Why I like the Cassandra distributed database'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2632235203726416733</id><published>2009-02-16T07:58:00.000-08:00</published><updated>2009-03-06T07:18:47.151-08:00</updated><title type='text'>Impressed by KDE 4.2</title><content type='html'>I'm running Linux on my desktop at work after a year of OS X, and Gnome as shipped by Ubuntu 8.10 has just been a world of hurt.  The panel looks and works like ass when moved to the left side of the screen (the only sane place to put it in today's world of widescreen monitors), network-manager just decided to quit working one day (I got by with wicd after that), alt-tab behavior sucks both ways you can configure it, etc.
&lt;p&gt;
I installed KDE 4.2 over the weekend to see if I was missing anything there.
&lt;/p&gt;&lt;p&gt;
Wow.
&lt;/p&gt;&lt;p&gt;
It's like daylight after being in a cave for two months.  I didn't realize how hard it has been to use a butt-ugly environment until I wasn't anymore.  (Yes, I tried all the gnome themes I could find.  Even Nimbus which took a bit of work.  What's that recently-famous phrase?  "Lipstick on a pig?")
&lt;/p&gt;&lt;p&gt;
What is better in KDE?  In a word, everything.  And put me in the camp that really likes having the desktop turned into a usable area for the first time.  Like apple's dashboard, except it doesn't suck.  I always hated dashboard.
&lt;/p&gt;&lt;p&gt;
Things that could be improved:
&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Never in a thousand years would I have thought to look under "Regional &amp;amp; Language" for the preference to turn caps lock into control.  I had to google this.&lt;/li&gt;&lt;li&gt;I'm still not sure how to set F9 to Present Windows.  Or how to bind a keystroke to the K menu as a poor man's quicksilver.
&lt;/li&gt;&lt;li&gt;More generally, a "Welcome to kde.  Let me teach you how to be a power user" tutorial would be nice.  I have the feeling there is lots of awesome under the hood if I knew where it was.  I never got that feeling from gnome.  ("Beauty is only skin deep, but ugly goes right to the bone.")
&lt;/li&gt;&lt;li&gt;Firefox UI widgets are imperfectly themed from XUL to GTK to KDE.  But it is useable.  (And having my second monitor redraw correctly instead of leaving artifacts when windows are moved makes up for that.)  Is this KDE's fault?  Firefox's?  I don't know.
&lt;/li&gt;&lt;li&gt;Konqueror is still using KHTML instead of webkit which means it is mostly unusable in the world of "web 2.0."  Yes, you can install webkitkde but that is Very Alpha.  ("Open in new window" doesn't work, for instance.  "Open in new tab" is gone entirely.)
&lt;/li&gt;&lt;li&gt;I couldn't find an option to just use icons in the task manager widget.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2632235203726416733?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2632235203726416733/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2632235203726416733' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2632235203726416733'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2632235203726416733'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/02/impressed-by-kde-42.html' title='Impressed by KDE 4.2'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5534946454171897984</id><published>2009-02-07T08:47:00.000-08:00</published><updated>2009-02-07T09:00:16.011-08:00</updated><title type='text'>SQLAlchemy at PyCon 2009</title><content type='html'>I will be giving an &lt;a href="http://pycon.org/2009/tutorials/schedule/2AM4/"&gt;Introduction to SQLAlchemy&lt;/a&gt; tutorial and Mike Bayer and Jason Kirtland will be teaching &lt;a href="http://pycon.org/2009/tutorials/schedule/2PM4/"&gt;Advanced SQLAlchemy&lt;/a&gt;, both on Thursday.  I'll be covering similar material as &lt;a href="http://spyced.blogspot.com/2008/03/slides-from-introduction-to-sqlalchemy.html"&gt;last year&lt;/a&gt;, updated for 0.5.  I'm also trying to see if I can get the emails of the registrants so far to see what else they would like covered. 

My tutorial style is exercise-heavy, so if you've read the docs or my slides but still find it hard to write SQLA code, coming to the tutorial is a great way to fix that.

(Note: the blog link to the 2008 slides is broken since we moved utahpython.org.  If you want them, drop me a note.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5534946454171897984?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5534946454171897984/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5534946454171897984' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5534946454171897984'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5534946454171897984'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/02/sqlalchemy-at-pycon-2009.html' title='SQLAlchemy at PyCon 2009'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3317045409031231069</id><published>2009-01-28T15:17:00.000-08:00</published><updated>2009-03-06T07:19:13.403-08:00</updated><title type='text'>All you ever wanted to know about writing bloom filters</title><content type='html'>It seems like &lt;a href="http://en.wikipedia.org/wiki/Bloom_filter"&gt;Bloom filters&lt;/a&gt; are all the rage these days.  Three years ago I had barely heard of them and now it seems like I see articles and code using them all the time.  That's mostly a good thing, since bloom filters are a very useful tool to avoid performing expensive computations without the full memory overhead of a standard map/dictionary.&lt;div&gt;
&lt;/div&gt;&lt;div&gt;Bloom filters are surprisingly simple: divide a memory area into buckets (one bit per bucket for a standard bloom filter; more -- typically four -- for a counting bloom filter).  To insert a key, generate several hashes per key, and mark the buckets for each hash.  To check if a key is present, check each bucket; if any bucket is empty, the key was never inserted in the filter.  If all buckets are non-empty, though, the key is only &lt;span class="Apple-style-span" style="font-style: italic;"&gt;probably &lt;/span&gt;inserted -- other keys' hashes could have covered the same buckets.  Determining exactly how big to make the filter and how many hashes to use to achieve a given false positive rate is a solved problem; &lt;a href="http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html"&gt;the math is out there&lt;/a&gt;.
&lt;/div&gt;&lt;div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;But it turns out that it's surprisingly hard to find good information on one part of the implementation: how do you generate an indefinite number of hashes?  Even small filters will use three or four; a dozen or more is not unheard of.&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;&lt;a href="http://code.google.com/p/the-cassandra-project/"&gt;Cassandra&lt;/a&gt; uses bloom filters to save IO when performing a key lookup: each SSTable has a bloom filter associated with it that Cassandra checks before doing any disk seeks, making queries for keys that don't exist almost free.  Unfortunately, Cassandra &lt;a href="http://code.google.com/p/the-cassandra-project/source/browse/branches/development/src/com/facebook/infrastructure/utils/BloomFilter.java?r=78"&gt;was using&lt;/a&gt; perhaps the worst possible implementation of hash functions: a hardcoded list of hash functions, apparently from &lt;a href="http://www.partow.net/programming/hashfunctions/"&gt;this page&lt;/a&gt;.  A lot of those hashes are reimplementations of the same algorithm!  For 6 or less hash functions, Cassandra was only actually generating 2 distinct hash values, so the false positive rate was far far higher than would have been expected.&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;Judging from the bloom filter implementations out there, generating appropriate hashes is surprisingly hard to get right.  &lt;a href="http://blog.locut.us/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/"&gt;One implementation&lt;/a&gt; in java uses object.hashCode() to seed the stdlib's PRNG and calls Random.nextInt() to generate the bloom hashes.  This works okay for small filters but the false positive rate is up to 140% of the expected rate for large filters.  &lt;a href="http://www.coolsnap.net/kevin/?p=13"&gt;This one&lt;/a&gt; in python combines the stdlib hash and a pjw hash &lt;a href="http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf"&gt;with simple arithmetic&lt;/a&gt; to achieve "only" an extra 10-15% false positives.&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;It turns out that most off-the-shelf algorithms suck for bloom filter purposes because their output isn't distributed well-enough across the (32-bit) hash space; most hash function authors  are more concerned with speed than achieving a really good output distribution.  For a lot of applications, this is a good tradeoff, but you are presumably using a bloom filter because a filter hit requires a relatively very expensive operation, so double-digit increases in this rate is Very Bad.  (So I would be skeptical of any home-made hash function generator too, like &lt;a href="http://www.imperialviolet.org/pybloom.html"&gt;PyBloom&lt;/a&gt;'s.)&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;There are two approaches that I found generate results in line with the theoretical predictions.  One is to use a cryptographic hash function like SHA-1.  Cryptography's requirements may be overkill for a bloom filter, but crypto hashes do spread their output uniformly across the hash space.  So SHA-1 works fine, and you can feed the hash a constant or part of the earlier-generated hash to generate extra data as needed so you're not limited to 5 hashes.  (Although Java makes this a pain by having all the digest methods also reset the hash to uninitialized.  &lt;a href="http://docs.python.org/library/hashlib.html"&gt;Explicit is better than implicit&lt;/a&gt;!)&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;The other is the jenkins and murmur hashes, which are used by hadoop's bloom filter implementation.  You get as many hashes as you want by using the output of hash#i as the initial value for hash#i+1.  Both of these generate very well-distributed hashes, and are about &lt;span class="Apple-style-span" style="font-style: italic;"&gt;twice&lt;/span&gt; as fast as the SHA approach, respectively, for 5 hashes, with murmur being about 10% faster than jenkins.  (Remember, SHA will give you 160 bits of pseudo-randomness at a time, so these hashes will be faster still depending on how far you are from a multiple of that.)&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;In short: use the murmur hash in your bloom filter.  If you're using java, you can grab Cassandra's implementation from &lt;a href="http://github.com/jbellis/cassandra-dev/tree/e284df7536ef32869b87d903a5f92f6a96c84801/src/com/facebook/infrastructure/utils"&gt;my github tree&lt;/a&gt; until our new repository is up at the apache incubation site.  Unfortunately it's spread across half a dozen files, without even counting the tests.  Start with [Counting]BloomFilter and go up from there.  Your other option is Hadoop's (if you grab it from svn, because that has &lt;a href="https://issues.apache.org/jira/browse/HADOOP-5079"&gt;my fix&lt;/a&gt; included), which is even less self-contained and requires wrapping your keys in a Key object.  Sorry.&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;Bonus tip: if you do roll your own, you should probably port &lt;a href="http://github.com/jbellis/cassandra-dev/tree/e284df7536ef32869b87d903a5f92f6a96c84801/test/com/facebook/infrastructure/utils"&gt;Cassandra's test suite&lt;/a&gt; to your language of choice.  In particular, nobody else bothers to check that false positive rates are within what is predicted by the math, but if you don't, it's easy for bugs that pass simple tests to slip by, like the one in Cassandra's older implementation, or the one I found in Hadoop's.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3317045409031231069?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3317045409031231069/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3317045409031231069' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3317045409031231069'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3317045409031231069'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html' title='All you ever wanted to know about writing bloom filters'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5060389222595268934</id><published>2008-12-31T08:16:00.000-08:00</published><updated>2009-06-26T05:31:01.821-07:00</updated><title type='text'>CouchDB: not drinking the kool-aid</title><content type='html'>This is my attempt to clear up some misconceptions about CouchDB and point out some technical details that a lot of people seem to have overlooked.  For the record, I like Damien Katz's blog, he seems like a great programmer, and Erlang looks cool.  Please don't hurt me.&lt;div&gt;
&lt;/div&gt;&lt;div&gt;First, and most important: CouchDB is not a distributed database.  BigTable is a distributed database.  &lt;a href="http://code.google.com/p/the-cassandra-project/"&gt;Cassandra&lt;/a&gt; and &lt;a href="http://github.com/cliffmoon/dynomite/tree/master"&gt;dynomite&lt;/a&gt; are distributed databases.  (And open source, and based on a better design than BigTable.  More on this in another post.)  It's true that with CouchDB &lt;a href="http://wiki.apache.org/couchdb/Configuring_distributed_systems"&gt;you can "shard" data out to different instances&lt;/a&gt; just like you can with MySQL or PostgreSQL.  That's not what people think when they see "distributed database." It's also true that CouchDB has good replication, but even multi-master replication isn't the same as a distributed database: you're still limited to the write throughput of the slowest machine.&lt;/div&gt;&lt;div&gt;
&lt;div&gt;Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Writes are serialized.  Not serialized as in the &lt;a href="http://en.wikipedia.org/wiki/Isolation_(database_systems)"&gt;isolation level&lt;/a&gt;, serialized as in &lt;a href="http://horicky.blogspot.com/2008/10/couchdb-implementation.html"&gt;there can only be one write active at a time&lt;/a&gt;.  Want to spread writes across multiple disks?  Sorry.&lt;/li&gt;&lt;li&gt;CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes.  Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less.&lt;/li&gt;&lt;li&gt;CouchDB is simple.  Gloriously simple.  Why is that a negative?  It's competing with systems (in the popular imagination, if not in its author's mind) that have been maturing for years.  The reason PostgreSQL et al have those features is because &lt;span class="Apple-style-span" style="font-style: italic;"&gt;people want them&lt;/span&gt;.  And if you don't, you should at least ask a DBA with a few years of non-MySQL experience what you'll be missing.  The majority of CouchDB fans don't appear to really understand what a good relational database gives them, just as a lot of PHP programmers don't get what the big deal is with namespaces.&lt;/li&gt;&lt;li&gt;A special case of simplicity deserves mention: nontrivial queries must be &lt;a href="http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views"&gt;created as a view with mapreduce&lt;/a&gt;.  MapReduce is a great approach to &lt;a href="http://code.google.com/edu/parallel/mapreduce-tutorial.html"&gt;trivially parallelizing&lt;/a&gt; certain classes of problem.  The problem is, it's tedious and error-prone to write raw MapReduce code.  This is why Google and Yahoo have both created high-level languages on top of it (&lt;a href="http://research.google.com/archive/sawzall.html"&gt;Sawzall&lt;/a&gt; and &lt;a href="http://research.yahoo.com/node/90"&gt;Pig&lt;/a&gt;, respectively).  Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages.  It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5060389222595268934?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5060389222595268934/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5060389222595268934' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5060389222595268934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5060389222595268934'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/12/couchdb-not-drinking-kool-aid.html' title='CouchDB: not drinking the kool-aid'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7184307941304405954</id><published>2008-12-24T07:42:00.000-08:00</published><updated>2008-12-24T09:32:34.346-08:00</updated><title type='text'>RackLabs</title><content type='html'>&lt;p&gt;Today marks one month that I've been working for Rackspace's RackLabs with the Mosso group in San Antonio, Texas.  (Anyone want to start a Python group?  The closest one is in Austin.)&lt;/p&gt;&lt;p&gt;It's kind of a gentle introduction to big company culture for me; at around 2,000 employees, Rackspace is easily ten times as large as any other company I've worked for, and 100 times as large as most.  Mosso is a lot smaller and RackLabs itself is smaller still, but I still had to go to five days (!) of corporate orientation.  Other than that, though, we're pretty much left alone by our corporate parent.&lt;/p&gt;&lt;p&gt;To start with, I'm working on Mosso's &lt;a href="http://www.mosso.com/cloudfiles.jsp"&gt;Cloud Files&lt;/a&gt;, which is basically an S3 competitor.  Cloud Files is similar to the work I did at Mozy, but there are a lot of technical differences.  Some are driven by Cloud Files being more of a general purpose storage engine than the one I wrote for Mozy; others stem from the Cloud Files authors being &lt;a href="http://twistedmatrix.com/"&gt;Twisted&lt;/a&gt; fans.&lt;/p&gt;&lt;p&gt;Strange coincidence: as with Mozy, I share an office here with a Debian developer, probably the only one in San Antonio.  My experience is that debian developers are pretty sharp guys, probably in no small part due to the rigorous screening process you have to go through.  They set a high bar.&lt;/p&gt;&lt;p&gt;Of course this continues to be my personal blog, and all opinions are mine alone.  &lt;a href="http://blog.racklabs.com/"&gt;RackLabs has its own blog&lt;/a&gt; for when they want to say something official.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7184307941304405954?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7184307941304405954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7184307941304405954' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7184307941304405954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7184307941304405954'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/12/racklabs.html' title='RackLabs'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2352045473205352156</id><published>2008-12-19T21:47:00.000-08:00</published><updated>2008-12-19T22:15:24.755-08:00</updated><title type='text'>Frustrated with git</title><content type='html'>&lt;p&gt;I'm a little over a week into a git immersion program.  Let me just say that git's reputation of being a little arcane (okay, more than a little) and having a steep learning curve is 100% deserved.&lt;/p&gt;&lt;p&gt;One thing that would mitigate things is if git would give you feedback when you tell it to do nonsense.  But it doesn't.  Here's me trying to get machine B to always merge the debug branch from machine A when I pull:&lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;tt&gt;
232 git config branch.debug.remote origin
234 git config branch.master.remote origin
236 git config branch.master.remote origin/debug&lt;/tt&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt;All of these commands completed silently.  None accomplished what I wanted.  In the end I renamed master to old and debug to master to avoid having to fight it.  Then I blew away my working copy and re-cloned because those config statements had created a new problem that I didn't know how to undo.&lt;/p&gt;&lt;p&gt;I'm sure the git virtuosos out there will know what was wrong.  That's not the point.  The point is that the tool gave me no feedback.  It was like git was telling me, "Figure it out yourself.  Or don't.  I don't care."  Which is par for the course with my git experience so far.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2352045473205352156?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2352045473205352156/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2352045473205352156' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2352045473205352156'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2352045473205352156'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/12/frustrated-with-git.html' title='Frustrated with git'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5057398121169247345</id><published>2008-12-18T06:14:00.000-08:00</published><updated>2010-06-21T08:05:20.295-07:00</updated><title type='text'>FormAlchemy 1.1: admin app, composite key support</title><content type='html'>FormAlchemy 1.1 is out, so you no longer need to run trunk to get the admin app goodness -- now with i8n support.  We also added support for all composite primary keys, and most composite foreign keys.  (The distinction is, rendering an object depends on the PK, but loading relations depends on FKs.)  Gael also added &lt;a href="http://docs.formalchemy.org/ext/fsblob.html"&gt;the fsblob extension&lt;/a&gt;, which allows storing blobs on the filesystem and the path in the database.  (FormAlchemy can handle blob-in-the-db out of the box.)&lt;div&gt;
&lt;/div&gt;&lt;div&gt;(I previously blogged about &lt;a href="http://spyced.blogspot.com/2008/10/formalchemy-10.html"&gt;basic FormAlchemy&lt;/a&gt; and &lt;a href="http://spyced.blogspot.com/2008/10/small-admin-app-for-pylons.html"&gt;the admin app&lt;/a&gt;, which are still good introductions.)&lt;/div&gt;&lt;div&gt;
&lt;/div&gt;&lt;div&gt;FormAlchemy has pretty good &lt;a href="http://docs.formalchemy.org/current/index.html"&gt;documentation&lt;/a&gt;.  The most important page is &lt;a href="http://docs.formalchemy.org/current/forms.html"&gt;form generation&lt;/a&gt;; instructions to configure the admin app are &lt;a href="http://docs.formalchemy.org/current/ext/pylons.html"&gt;here&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5057398121169247345?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5057398121169247345/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5057398121169247345' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5057398121169247345'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5057398121169247345'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/12/formalchemy-11-admin-app-composite-key.html' title='FormAlchemy 1.1: admin app, composite key support'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3049260489394073272</id><published>2008-11-03T07:21:00.000-08:00</published><updated>2008-11-03T08:54:06.930-08:00</updated><title type='text'>An unusual approach to log parsing</title><content type='html'>&lt;p&gt;I saw an interesting &lt;a href="http://blog.amber.org/2008/11/02/the-trouble-with-logging/"&gt;article about logging&lt;/a&gt; today on reddit, and it struck a nerve with me, specifically how most text logs are not designed for easy parsing.  (I don't agree with the second point, though -- sometimes logging &lt;em&gt;is&lt;/em&gt; tracing, or perhaps more accurately, sometimes tracing is logging.)&lt;/p&gt;&lt;p&gt;We had a &lt;em&gt;lot&lt;/em&gt; of log and trace data at Mozy, several GB per day.  A traditional parsing approach would have been tedious and prone to regressions when the messages generated by the server changed.  So Paul Cannon, a &lt;a href="http://steve-yegge.blogspot.com/2006/04/lisp-is-not-acceptable-lisp.html"&gt;frustrated lisp programmer&lt;/a&gt;, designed a system where the log API looked something like this:&lt;/p&gt;&lt;pre class="code"&gt;&lt;tt&gt;self.log('command_reply(%r, %r)' % (command, arg))&lt;/tt&gt;&lt;/pre&gt;&lt;p&gt;Then the log processor would define the vocabulary (&lt;tt&gt;command_reply&lt;/tt&gt;, etc.) and, instead of parsing the log messages, &lt;tt&gt;eval&lt;/tt&gt; them!   This is an approach that wouldn't have occurred to me, nor would I have thought of using &lt;a href="http://www.python.org/dev/peps/pep-0309/"&gt;partial function application&lt;/a&gt; to simplify passing state (from the log processor and/or previous log entries) to these functions.  (e.g., the entry for &lt;tt&gt;command_reply&lt;/tt&gt; in the eval namespace might be &lt;tt&gt;'command_reply': partial(self.command_reply, db_cursor, thread_id)&lt;/tt&gt;)&lt;p&gt;There are drawbacks to this approach; perhaps the largest is that this works best in homogeneous systems.  Python's repr function (invoked by the %r formatting character) is great at taking care of any quoting issues necessary when dealing with Python primitives, as well as custom objects with some help from the programmer.  But when we started having a C++ system also log messages to this system, it took them several tries to fix all the corner cases involved in generating messages that were valid python code.&lt;p&gt;On balance, I think this un-parsing approach was a huge win, and as the first application of "code is data" that made more than theoretical sense to me it was a real "eureka!" moment.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3049260489394073272?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3049260489394073272/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3049260489394073272' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3049260489394073272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3049260489394073272'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/11/unusual-approach-to-log-parsing.html' title='An unusual approach to log parsing'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4444182307640560858</id><published>2008-10-24T19:07:00.000-07:00</published><updated>2009-07-16T20:07:25.982-07:00</updated><title type='text'>A small admin app for Pylons</title><content type='html'>&lt;p&gt;I &lt;a href="http://spyced.blogspot.com/2008/10/formalchemy-10.html"&gt;said&lt;/a&gt; that it would be possible to build a django-style admin interface for Pylons using &lt;a href="http://code.google.com/p/formalchemy/"&gt;FormAlchemy&lt;/a&gt;. (That is, generate a UI for basic CRUD operations for all your models, with no further configuration necessary.)  I have a proof of concept in FA svn; it's missing some obvious features like internationalization so there is no official release yet.  But the basics are there, so in the meantime, if you'd like to &lt;a href="http://docs.formalchemy.org/current/ext/pylons.html"&gt;kick the tires&lt;/a&gt;, just &lt;a href="http://code.google.com/p/formalchemy/wiki/InstallingFormAlchemy"&gt;install FA from svn&lt;/a&gt; and give it a try.&lt;/p&gt;&lt;p&gt;Here are some screenshots from a pylons app incorporating models from the FA test suite.  (The admin controller is fully customizable using standard FA (and Pylons) techniques, but these are what you'd see out-of-the-box.)&lt;/p&gt;&lt;p&gt;&lt;/p&gt;

Index:

&lt;img style="margin: 0pt 10px 10px 0pt; width: 351px; height: 400px;" src="http://1.bp.blogspot.com/_bwSkwFkEnF0/SQMz0XpygDI/AAAAAAAAAEQ/AHVRxRm3BTo/s400/Picture+4.png" alt="" id="BLOGGER_PHOTO_ID_5261105764494377010" border="0" /&gt;



Order page:

&lt;img style="margin: 0pt 10px 10px 0pt; width: 388px; height: 400px;" src="http://4.bp.blogspot.com/_bwSkwFkEnF0/SQMz0j1ocpI/AAAAAAAAAEY/Hi6MO_qN6rs/s400/Picture+5.png" alt="" id="BLOGGER_PHOTO_ID_5261105767765275282" border="0" /&gt;



Creating a new Order:

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_bwSkwFkEnF0/SQMz1BmG7nI/AAAAAAAAAEg/FW7DnLg8Ipk/s1600-h/Picture+6.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; width: 400px; height: 194px;" src="http://4.bp.blogspot.com/_bwSkwFkEnF0/SQMz1BmG7nI/AAAAAAAAAEg/FW7DnLg8Ipk/s400/Picture+6.png" alt="" id="BLOGGER_PHOTO_ID_5261105775753227890" border="0" /&gt;&lt;/a&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_bwSkwFkEnF0/SQMz1QOIKfI/AAAAAAAAAEo/WXoeJjGk1QI/s1600-h/Picture+7.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; width: 332px; height: 400px;" src="http://3.bp.blogspot.com/_bwSkwFkEnF0/SQMz1QOIKfI/AAAAAAAAAEo/WXoeJjGk1QI/s400/Picture+7.png" alt="" id="BLOGGER_PHOTO_ID_5261105779679177202" border="0" /&gt;&lt;/a&gt;



Deleting an Order:

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_bwSkwFkEnF0/SQMz1hAswjI/AAAAAAAAAEw/Z8P0tw4xau4/s1600-h/Picture+8.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; width: 348px; height: 400px;" src="http://4.bp.blogspot.com/_bwSkwFkEnF0/SQMz1hAswjI/AAAAAAAAAEw/Z8P0tw4xau4/s400/Picture+8.png" alt="" id="BLOGGER_PHOTO_ID_5261105784186257970" border="0" /&gt;&lt;/a&gt;



The User page:

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_bwSkwFkEnF0/SQM0AXx79MI/AAAAAAAAAE4/GSM2qgd-TRE/s1600-h/Picture+11.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; width: 400px; height: 277px;" src="http://2.bp.blogspot.com/_bwSkwFkEnF0/SQM0AXx79MI/AAAAAAAAAE4/GSM2qgd-TRE/s400/Picture+11.png" alt="" id="BLOGGER_PHOTO_ID_5261105970686981314" border="0" /&gt;&lt;/a&gt;



Editing a User instance:

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_bwSkwFkEnF0/SQM0AvOPr-I/AAAAAAAAAFA/WbpThyQ7mig/s1600-h/Picture+12.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; width: 400px; height: 296px;" src="http://3.bp.blogspot.com/_bwSkwFkEnF0/SQM0AvOPr-I/AAAAAAAAAFA/WbpThyQ7mig/s400/Picture+12.png" alt="" id="BLOGGER_PHOTO_ID_5261105976979730402" border="0" /&gt;&lt;/a&gt;



Documentation on using and customizing the pylons admin app is &lt;a href="http://docs.formalchemy.org/current/ext/pylons.html"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4444182307640560858?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4444182307640560858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4444182307640560858' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4444182307640560858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4444182307640560858'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/10/small-admin-app-for-pylons.html' title='A small admin app for Pylons'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_bwSkwFkEnF0/SQMz0XpygDI/AAAAAAAAAEQ/AHVRxRm3BTo/s72-c/Picture+4.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3740167070717855569</id><published>2008-10-16T20:35:00.000-07:00</published><updated>2010-11-11T18:19:41.066-08:00</updated><title type='text'>FormAlchemy 1.0</title><content type='html'>&lt;p&gt;
A little background: &lt;a href="http://spyced.blogspot.com/2008/04/m-half-baked-thoughts-on-python-web.html"&gt;a few months ago,&lt;/a&gt; I went looking for a web framework that was good at automating CRUD (create/retrieve/update/delete) against an existing database schema.  I tried django but its database introspection abilities are beyond feeble, and django-sqlalchemy was not mature enough.  I tried dbmechanic but its dozen-plus dependencies, most of which were alpha-quality, gave me pause; so did its basic architecture on top of toscawidgets, which I think is The Wrong Way to build web apps.  (I understand that the former problem has since been reduced; the latter has not.)
&lt;/p&gt;&lt;p&gt;
So, I went back to option #3, &lt;a href="http://code.google.com/p/formalchemy/"&gt;FormAlchemy&lt;/a&gt;.  I knew SQLAlchemy could reflect very hairy schemas indeed, and what it could not reflect, it could certainly represent with a little manual help.  And FormAlchemy was a decent start to automating CRUD with SA models.  I added the ability to represent relations, automatic syncing of form input back to SA objects, Grid support, and a test suite.  Then Gael came along and added internationalization, support for even more SA features, and Sphinx docs.  Along the way we've killed enough bugs and added enough test cases (yes, &lt;a href="http://ivory.idyll.org/blog/feb-07/stupidity-driven-testing.html"&gt;the two are related&lt;/a&gt;) that we think we have a pretty solid release.  Especially since I just released 1.0.1 fixing the most obvious problems. :)
&lt;/p&gt;&lt;p&gt;
I think all three FA committers use it mostly with Pylons; that said, FormAlchemy has no dependencies besides SQLAlchemy itself.  You could easily use it with werkzeug or web.py or whatever.

&lt;/p&gt;&lt;p&gt;Here, finally, is a quick FormAlchemy tutorial:

&lt;/p&gt;&lt;p&gt;To get started, you only need to know about two classes, &lt;tt&gt;FieldSet&lt;/tt&gt; and &lt;tt&gt;Grid&lt;/tt&gt;, and a handful of methods: &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;tt&gt;render&lt;/tt&gt;: returns a string containing the html &lt;/li&gt;&lt;li&gt;&lt;tt&gt;validate&lt;/tt&gt;: true if the form passes its validations; otherwise, false &lt;/li&gt;&lt;li&gt;&lt;tt&gt;sync&lt;/tt&gt;: syncs the model instance that was bound to the input data &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;This introduction illustrates these three methods. For full details on customizing &lt;tt&gt;FieldSet&lt;/tt&gt; behavior, see &lt;a href="http://docs.formalchemy.org/" rel="nofollow"&gt;the documentation&lt;/a&gt;. &lt;/p&gt;&lt;p&gt;We'll start with two simple SQLAlchemy models with a one-to-many relationship (each &lt;tt&gt;User&lt;/tt&gt; can have many &lt;tt&gt;Order&lt;/tt&gt;s), and fetch an &lt;tt&gt;Order&lt;/tt&gt; object to edit: &lt;/p&gt;&lt;pre class="prettyprint"&gt; from formalchemy.tests import Session, User, Order
session = Session()
order1 = session.query(Order).first()&lt;/pre&gt;&lt;p&gt;Now, let's render a form to edit the order we've loaded. &lt;/p&gt;&lt;pre class="prettyprint"&gt; from formalchemy import FieldSet, Grid
fs = FieldSet(order1)
print fs.render()&lt;/pre&gt;&lt;p&gt;This results in the following form elements: &lt;/p&gt;&lt;blockquote&gt;&lt;img src="http://formalchemy.googlecode.com/svn/trunk/docs/example.png" /&gt;
&lt;/blockquote&gt;&lt;p&gt;Note how the options for the User input were automatically loaded from the database.  &lt;tt&gt;str()&lt;/tt&gt; is used on the User objects to get the option descriptions. &lt;/p&gt;&lt;p&gt;To edit a new object, bind your &lt;tt&gt;FieldSet&lt;/tt&gt; to the class rather than a specific instance: &lt;/p&gt;&lt;pre class="prettyprint"&gt;  fs = FieldSet(Order)&lt;/pre&gt;&lt;p&gt;To edit multiple objects, bind them to a &lt;tt&gt;Grid&lt;/tt&gt; instead: &lt;/p&gt;&lt;pre class="prettyprint"&gt; orders = session.query(Order).all()
g = Grid(Order, orders)
print g.render()&lt;/pre&gt;&lt;p&gt;Which results in: &lt;/p&gt;&lt;blockquote&gt;&lt;img src="http://formalchemy.googlecode.com/svn/trunk/docs/example-grid.png" /&gt;
&lt;/blockquote&gt;&lt;p&gt;Saving changes is similarly easy.  (Here we're using Pylons-style &lt;tt&gt;request.params()&lt;/tt&gt;; adjust for your framework of choice as necessary): &lt;/p&gt;&lt;pre class="prettyprint"&gt; fs = FieldSet(order1, request.params())
if fs.validate():
    fs.sync()
    session.commit()&lt;/pre&gt;&lt;p&gt;&lt;tt&gt;Grid&lt;/tt&gt; works the same way.  More details in &lt;a href="http://docs.formalchemy.org/"&gt;the documentation&lt;/a&gt;; start with &lt;a href="http://docs.formalchemy.org/forms.html"&gt;Form generation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To give FormAlchemy a try, just easy_install it.  If you have any questions, Alex and I are often in both #sqlalchemy and #pylons on freenode.  And of course there's always the mailing list.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3740167070717855569?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3740167070717855569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3740167070717855569' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3740167070717855569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3740167070717855569'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/10/formalchemy-10.html' title='FormAlchemy 1.0'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5039753540168274569</id><published>2008-09-25T14:29:00.000-07:00</published><updated>2008-09-25T19:08:21.807-07:00</updated><title type='text'>Available</title><content type='html'>&lt;p&gt;&lt;a href="http://www.feature50.com/"&gt;Feature50&lt;/a&gt; is winding down now that CEO Ben Galbraith has accepted a job offer elsewhere.  So, I'm interested in exploring my options, specifically, opportunities to build out the technology for a start-up working in concert with a strong business CEO. I've done this twice now.&lt;/p&gt;&lt;h4&gt;Technical ability&lt;/h4&gt;&lt;p&gt;I am a senior developer specializing in back-end technologies.   At &lt;a href="http://mozy.com/"&gt;Mozy&lt;/a&gt;, where I was employee #2, I wrote a distributed file repository that stores petabytes of data, an amount comparable to Amazon's S3.  I have 8 years of experience with PostgreSQL.  I know how to design for scale, and how to find and remove bottlenecks.  I am not afraid of diving into a new code base; I took over as maintainer of the Spyce web framework and the &lt;a href="http://code.google.com/p/formalchemy/"&gt;FormAlchemy&lt;/a&gt; toolkit, and I have contributed features or patches to SQLAlchemy, Pylons, and Jython, among others.  &lt;/p&gt;&lt;h4&gt;Soft skills
&lt;/h4&gt;&lt;p&gt;I enjoy building and working with a team.  At Feature50 I am responsible for technical interviews, and personally recruited five of our first eight developers.  At Mozy, I recruited three of the first five.  I designed a customized version of Review Board -- a code review tool -- for Feature50 and MediaBank, and contributed several patches back to the project.  I am active in the Python community and spoke at the last three PyCon conferences.  I have spoken at OSCON and I am speaking at PostgreSQL Conference West in October.
&lt;/p&gt;&lt;h4&gt;The bottom line
&lt;/h4&gt;&lt;p&gt;I'm looking to work on a challenging project -- that is, not Yet Another CRUD App -- with a small team. I am currently based in Utah; I am willing to work remotely or relocate.  Contact me at jonathan at utahpython dot org.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5039753540168274569?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5039753540168274569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5039753540168274569' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5039753540168274569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5039753540168274569'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/09/available.html' title='Available'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3509977296647974884</id><published>2008-09-01T02:14:00.000-07:00</published><updated>2011-11-02T08:28:29.807-07:00</updated><title type='text'>Blog Day recommendations</title><content type='html'>&lt;p&gt;As with many things in the blogging echo chamber, &lt;a href="http://www.blogday.org/"&gt;blog day&lt;/a&gt; takes itself a little too seriously.  But it's impossible not to love an excuse to talk about some of my favorite blogs that don't seem to have as much exposure as I think they deserve:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://lethargy.org/%7Ejesus/"&gt;Theo Schlossnagle&lt;/a&gt;, CEO of OmniTI, a scalability and performance consulting company.  His best posts deal with &lt;a href="http://lethargy.org/%7Ejesus/archives/118-Dissecting-todays-Internet-traffic-spikes.html"&gt;scalability at the ops level&lt;/a&gt;.  His book is good, too.&lt;/li&gt;&lt;li&gt;&lt;a href="http://glinden.blogspot.com/"&gt;Greg Linden&lt;/a&gt;, ex-Amazon engineer, ex-Findory founder, current MS Live Labs employee.  He likes to post analyses of interesting CS talks and papers, particularly in the area of collective intelligence.  Greg stays very on-topic so the most recent posts are about as representative as any.&lt;/li&gt;&lt;li&gt;&lt;a href="http://utcc.utoronto.ca/%7Ecks/space/blog/"&gt;Chris Siebenmann&lt;/a&gt; writes about &lt;a href="http://utcc.utoronto.ca/%7Ecks/space/blog/solaris/ZFSOverPrefetchingUpdate"&gt;life as a professional sysadmin&lt;/a&gt;.  He also sometimes &lt;a href="http://utcc.utoronto.ca/%7Ecks/space/blog/python/MinimizingObjectChurn"&gt;blogs about python&lt;/a&gt;. &lt;/li&gt;&lt;li&gt;&lt;a href="http://it.toolbox.com/blogs/database-soup"&gt;Josh Berkus&lt;/a&gt;, PostgreSQL core team member, mostly blogs about current events in the database world, but every once in a while he writes a &lt;a href="http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327"&gt;must-read post about database design&lt;/a&gt;.  Google thinks that &lt;a href="http://it.toolbox.com/blogs/database-soup/joshs-rules-of-database-contracting-17253"&gt;"Rules for Database Contracting"&lt;/a&gt; is his most popular post, and that's a good pick too.&lt;/li&gt;&lt;li&gt;A non-technical pick: &lt;a href="http://www.websnark.com/"&gt;Eric Burns&lt;/a&gt; is the Gene Siskel of web comic critique.  Except he's not dead.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3509977296647974884?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3509977296647974884/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3509977296647974884' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3509977296647974884'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3509977296647974884'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/09/blog-day-recommendations.html' title='Blog Day recommendations'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-892356824558365097</id><published>2008-08-29T14:16:00.000-07:00</published><updated>2009-07-08T05:53:10.298-07:00</updated><title type='text'>App Engine conclusions</title><content type='html'>&lt;p&gt;Having been eyeball deep in App Engine for a while, evaluating it for a project at work and putting together a presentation for the utah open source conference, I've reluctantly concluded that I don't like it.  I &lt;em&gt;want&lt;/em&gt; to like it, since it's a great poster child for Python.  And there are some bright spots, like the dirt-simple integration with google accounts.  But it's so very very primitive in so many ways.  Not just the &lt;a href="http://code.google.com/p/googleappengine/issues/detail?id=6&amp;amp;colspec=ID%20Type%20Status%20Priority%20Stars%20Owner%20Summary"&gt;missing&lt;/a&gt; &lt;a href="http://code.google.com/p/googleappengine/issues/detail?id=109&amp;amp;colspec=ID%20Type%20Status%20Priority%20Stars%20Owner%20Summary"&gt;features&lt;/a&gt;, or the "you can use any web framework you like, as long as it's django" attitude, but primarily a lot of the existing API is just so very primitive.  &lt;/p&gt;&lt;p&gt;The DataStore in particular feels like a giant step backwards from using a traditional database with a &lt;a href="http://www.sqlalchemy.org/"&gt;sophisticated ORM&lt;/a&gt;.  Sure, it can scale if you use it right, but do you really know what that entails?&lt;/p&gt;&lt;p&gt;Take &lt;a href="http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine"&gt;the example of simple counting of objects&lt;/a&gt;.  There's a count() method, but in practice, it's so slow you can't use it.  Denormalize with a .count property?  Yeah, that doesn't scale either: what you &lt;em&gt;really&lt;/em&gt; need is a separate, sharded Counter class.  And yes, sharding is very, very manual.  (See slides 18-23 in the link there, and the associated video starting about 19:00.)  &lt;/p&gt;&lt;p&gt;You can't perform joins in GQL.  Or subselects.  Or call functions, aggregate or otherwise.  EVERYthing you are interested needs to be pre-computed.  (Or computed by hand client-side, which is so slow it's barely an option at all.)  I can extrapolate from this to my experience in production schemas and it's not pretty.&lt;/p&gt;&lt;p&gt;Of course, you also lose any ability to write declarative, set-based code, which is demonstrably less error-prone than the imperative alternative.  Take a simple example from my demo app.  Marking a group of todo items finished is four statements:&lt;/p&gt;&lt;pre class="code"&gt;
items = TodoItem.get_by_id(
  [int(id) for id in request.POST.getlist('item_id')])
for item in items:
  item.finished = datetime.now()
  item.put()&lt;/pre&gt;&lt;p&gt;Compare this with SQL:&lt;/p&gt;&lt;pre class="code"&gt;
cursor.execute("update todo_items set finished = CURRENT_TIMESTAMP where id in %s",
             ([int(id) for id in request.POST.getlist('item_id')]))
&lt;/pre&gt;Scalability is great but taking a big hit to back-end productivity is too high a price for all but a few applications.  GAE is still young, so maybe Google will improve things, but their attitude so far seems to be "we know how to scale so shut up and do it the hard way."  I hope I am wrong.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-892356824558365097?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/892356824558365097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=892356824558365097' title='14 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/892356824558365097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/892356824558365097'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/08/app-engine-conclusions.html' title='App Engine conclusions'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>14</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3296940908111424167</id><published>2008-08-29T11:11:00.000-07:00</published><updated>2009-12-21T07:38:27.740-08:00</updated><title type='text'>App Engine slides, code</title><content type='html'>&lt;p&gt;My App Engine 101 &lt;a href="http://utahpython.org/jellis/gae101.pdf"&gt;slides&lt;/a&gt; and &lt;a href="http://utahpython.org/jellis/gae101.zip"&gt;code&lt;/a&gt; are up now.&lt;/p&gt;&lt;p&gt;Bad news: my macbook pro did not work with the projector, period.&lt;/p&gt;&lt;p&gt;Good news: I have seen it do this before (in a room with several mac experts -- it was not user error) and brought a backup laptop.&lt;/p&gt;&lt;p&gt;Bad news: I forgot to include the django beta1 framework in my code upload, so I told people to just download it.  But beta2 was out, and didn't work with the version of App Engine Helper I had.  (It looks like r58 fixes this.)  Manual poking about the django download site ensued until I got a new zip uploaded.&lt;/p&gt;&lt;p&gt;Good news: the conference organizers liked it anyway and asked me to present a second time later in the day.  Everything just worked the second time around.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3296940908111424167?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3296940908111424167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3296940908111424167' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3296940908111424167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3296940908111424167'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/08/app-engine-slides-code.html' title='App Engine slides, code'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4728626007628542490</id><published>2008-08-25T16:18:00.001-07:00</published><updated>2010-10-05T03:24:10.746-07:00</updated><title type='text'>Google App Engine at the Utah Open Source Conference</title><content type='html'>&lt;p&gt;App Engine is probably the biggest thing to happen to Python this year, so of course I volunteered to give a presentation on it at at the &lt;a href="http://2008.utosc.com/"&gt;Utah Open Source Conference&lt;/a&gt;.  (I'm scheduled for Friday, Aug 29, at 10:00 AM.)  Last year's conference was a big success, so I'm looking forward to an even better experience this year.&lt;/p&gt;&lt;p&gt;Here's the abstract I submitted, before they blew away my paragraph breaks:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;Google launched the App Engine service earlier this year to immense interest from the web development community. App Engine allows running applications on Google infrastructure, including BigTable, Google's non-relational, massively scalable database.&lt;/p&gt;&lt;p&gt;App Engine is appealing both at the low end, where small shops don't want to have to deal with hardware procurement and systems administration, and at the high end, where the kind of "instant scaling" App Engine promises to deal with bursty traffic is the holy grail of infrastructure planning. This tutorial will cover the basics of App Engine development, including development and deployment of a simple application.&lt;/p&gt;&lt;p&gt;Please sign up for an App Engine account and download the SDK ahead of time so we can jump right in to the code. Basic Python knowledge will be assumed.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;After I submitted the proposal, I found out that all presentations are going to be 60 minutes long.  That is not much time if we're going to do hands-on work, but you retain so much more by &lt;em&gt;doing&lt;/em&gt; than you do merely from &lt;em&gt;watching&lt;/em&gt; that I don't consider it optional.  So seriously, come with &lt;a href="http://code.google.com/appengine/downloads.html"&gt;the SDK&lt;/a&gt; installed.  Those who do not, can look over the shoulders of those who do.&lt;/p&gt;&lt;p&gt;If you don't know Python and you're a last minute kind of person, you might want to attend Matt Harrison's talk the day before, &lt;a href="http://2008.utosc.com/presentation/112/"&gt;90% of the Python you need to know&lt;/a&gt;.  Matt has presented several times at the &lt;a href="http://utahpython.org/"&gt;Utah Python User Group&lt;/a&gt; as well as PyCon.&lt;/p&gt;&lt;p&gt;Bonus tip: if you can't make it to the UTOSC, the two best talks on App Engine are &lt;a href="http://sites.google.com/site/io/rapid-development-with-python-django-and-google-app-engine"&gt;Rapid Development with Python, Django, and Google App Engine&lt;/a&gt;  and &lt;a href="http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine"&gt;Building Scalable Web Applications with Google App Engine&lt;/a&gt;.  My presentation will cover similar material to the first of these.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4728626007628542490?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4728626007628542490/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4728626007628542490' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4728626007628542490'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4728626007628542490'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/08/google-app-engine-at-utah-open-source.html' title='Google App Engine at the Utah Open Source Conference'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7415347354210667082</id><published>2008-08-15T14:47:00.000-07:00</published><updated>2011-11-03T11:24:28.485-07:00</updated><title type='text'>A reminder</title><content type='html'>&lt;p&gt;Now that I've been doing Python full time again for a while it's easy to forget how magical it can be.&lt;/p&gt;&lt;p&gt;Last night I got an IM from a friend of a friend asking for (a) a recommendation for a Python book and (b) advice on writing a screen scraper.  I pointed him to &lt;a href="http://diveintopython.org/"&gt;Dive Into Python&lt;/a&gt; and &lt;a href="http://www.crummy.com/software/BeautifulSoup/"&gt;BeautifulSoup&lt;/a&gt;.  Just now he IMed me again, "Hey, thanks for the tip.  I ended up writing a screen scraper that I hadn't completed in 2 days in Groovy in about 20 minutes last night in Python with BeautifulSoup.  So thanks, you got another python convert."&lt;/p&gt;&lt;p&gt;I love my job.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7415347354210667082?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7415347354210667082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7415347354210667082' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7415347354210667082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7415347354210667082'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/08/reminder.html' title='A reminder'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4399864661112119005</id><published>2008-07-22T16:06:00.000-07:00</published><updated>2011-04-02T20:48:27.798-07:00</updated><title type='text'>SQLAlchemy-Migrate for dummies</title><content type='html'>&lt;p&gt;I'm gave &lt;a href="http://code.google.com/p/sqlalchemy-migrate/"&gt;sqlalchemy-migrate&lt;/a&gt; a try today.  I like it, and I'm going to keep using it.  The one downside is that it's a bit hard to find "the least you need to know" in the documentation, especially if you lean old-school like me and prefer to write your upgrade scripts in raw sql.  So here's my stab at it.&lt;/p&gt;

&lt;p&gt;
Create a "repository" for upgrade scripts:
&lt;/p&gt;&lt;pre class="code"&gt;migrate create path/to/upgradescripts "comment"
&lt;/pre&gt;

&lt;p&gt;
Create your manage script.  If you have development/production dbs with different connection urls, create two scripts with the same repository but different urls:
&lt;/p&gt;&lt;pre class="code"&gt;migrate manage dbmanage.py --repository=path/to/upgradescripts --url=db-connection-url
&lt;/pre&gt;

&lt;p&gt;
For each database, create the Migrate metadata (a migrate_version table):
&lt;/p&gt;&lt;pre class="code"&gt;./dbmanage.py version_control
&lt;/pre&gt;

&lt;p&gt;
Create an upgrade script.  This will create a script [next version number]-[database type]-upgrade.sql in the "versions" subdirectory of your "repository."  That's all, so you could certainly do this by hand if you prefer, but letting the script do it is less error-prone:
&lt;/p&gt;&lt;pre class="code"&gt;./dbmanage.py script_sql sqlite
&lt;/pre&gt;

&lt;p&gt;
Edit the script.

&lt;/p&gt;&lt;p&gt;
For each database, apply the upgrade:
&lt;/p&gt;&lt;pre class="code"&gt;./dbmanage.py upgrade
&lt;/pre&gt;

&lt;p&gt;
Repeat the script/upgrade process as needed.  That's it!  Everything else is optional!

&lt;/p&gt;&lt;p&gt;
(What this gives you is a process where all your developers can have their own local database for development, and all they have to do is "svn up;  ./dbmanage.py upgrade" without having to worry about which upgrade scripts have been applied or not.)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4399864661112119005?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4399864661112119005/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4399864661112119005' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4399864661112119005'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4399864661112119005'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/07/sqlalchemy-migrate-for-dummies.html' title='SQLAlchemy-Migrate for dummies'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-736398615950414880</id><published>2008-07-13T21:53:00.000-07:00</published><updated>2008-07-13T21:58:36.227-07:00</updated><title type='text'>How to tell when you're successful</title><content type='html'>&lt;p&gt;You're successful when someone tries to get a cheap clone of &lt;a href="http://www.carnageblender.com"&gt;your site&lt;/a&gt; done on a cheap-labor code monkey site.&lt;/p&gt;&lt;p&gt;I'm &lt;a href="http://www.getacoder.com/projects/carnage_blender_clone_71413.html"&gt;flattered&lt;/a&gt;, I think.  (Although I'd be &lt;em&gt;more&lt;/em&gt; flattered if it were a &lt;em&gt;good&lt;/em&gt; code monkey site.)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-736398615950414880?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/736398615950414880/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=736398615950414880' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/736398615950414880'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/736398615950414880'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/07/how-to-tell-when-youre-successful.html' title='How to tell when you&apos;re successful'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3190191932308653995</id><published>2008-06-25T00:12:00.000-07:00</published><updated>2009-11-16T07:44:32.012-08:00</updated><title type='text'>Brief review of the Matias Half Keyboard et al</title><content type='html'>&lt;p&gt;I ended up buying four pieces of equipment to help deal with being &lt;a href="http://spyced.blogspot.com/2008/05/one-handed-typing.html"&gt;temporarily one-handed&lt;/a&gt;: the &lt;a href="http://www.amazon.com/gp/product/B00006IZIL"&gt;Matias half keyboard&lt;/a&gt;, the &lt;a href="http://www.amazon.com/gp/product/B0002HBGD6"&gt;X-keys foot pedal&lt;/a&gt; (cheaper than the Kinesis pedals, which got lukewarm reviews on Amazon), the &lt;a href="http://www.blogger.com/Keyspan%20PR-US2%20Presentation%20Remote"&gt;Keyspan PR-US2 Presentation Remote&lt;/a&gt;, and the &lt;a href="http://www.amazon.com/gp/product/B000ASDGX0"&gt;Pacific Outdoors 17-LC100 Folding Recliner&lt;/a&gt;. &lt;/p&gt;&lt;p&gt;The good: I'm very pleased with the recliner and modestly happy with the remote.  I got the recliner to take naps in; the brace on my arm didn't really accomodate lying down.  This $80 recliner compares well with zero gravity recliners costing over 10x as much.  (I've used two of the expensive variety; a BackSaver and one whose brand I don't recall.)  The only downside is you can either sit up, or recline fully; there is supposedly a way to adjust the recline angle, but it doesn't really work.  Expensive zero gravity recliners can all reliably lock at any angle you like.&lt;/p&gt;&lt;p&gt;The remote mostly worked as a mouse substitute that I could use with my immobilized right hand, reducing the need to slow down my left hand even more by switching from keyboard to mouse and back.  Unfortunately, the mouse control pad is not nearly as good as one of the IBM "pointing sticks;" it appears to have four control points, like an old Nintendo D-pad, which gives only 8 possible directions to move in.  This and a poorly quantized pressure sensitivity sometimes made things frustrating.  If I were to do this again I would try a handheld trackball instead, even though I could not find any wireless models.&lt;/p&gt;&lt;p&gt;The bad: the half keyboard did not help programming speed with one hand, and the foot pedal didn't improve things.  I've returned both.&lt;/p&gt;&lt;p&gt;The half keyboard gives you the left hand side of the keyboard, which toggles to the right side when the space bar is held down.  So "a" becomes ";", "f" becomes "j", snd so on.  For alphabetical keys, I found that it was true that I did not have to re-learn to touch type; I did not have to look at the keyboard, although I did have to pause and think, "does this one require the space toggle or not."  I got up to about 20 wpm before giving up, compared to 25 with one hand on a full keyboard.  I think I could have easily doubled that to 40+ wpm with enough practice to eliminate that pause and recognize "runs" of letters that can be typed without releasing the space, like "you," without thinking.  But that kind of investment wasn't worth it because of a serious flaw.&lt;/p&gt;&lt;p&gt;The half keyboard is really more like a "1/4 keyboard."  It &lt;em&gt;only&lt;/em&gt; gives you the alphabetical keys and a couple punctuation marks.  No number keys with their !@#$ counterparts.  No F keys.  No arrow keys.  On a mac, you can have cmd or control but not both.&lt;/p&gt;&lt;p&gt;To allow these keys to be typed, there is a "numeric toggle" key that switches to keypad mode, and two other modes that you access by hitting "shift shift" and "shift shift shift."  Almost any line of code you might want to type is going to run into this.  Typing [0] for instance is shift shift s numerictoggle b numerictoggle shift shift a.  Even the symbol-averse Java will need parentheses for method calls, and yes, parens require mode switching too.  (As do braces.  Shudder!)&lt;/p&gt;&lt;p&gt;So I lost in the non-alphabetical and modifier access much more than I could see myself gaining on the pure alphabetical side.  &lt;/p&gt;&lt;p&gt;Finally, the modifier keys were on the right hand side of the keyboard where they very difficult to combine with shift.  I tried to ameliorate the modifier key problems with the X-Keys pedal, mapping the pedals to cmd/ctrl/option, but &lt;a href="http://www.orderedbytes.com/forum/viewtopic.php?t=678"&gt;that didn't really work&lt;/a&gt; either.  (The included ikeys software wouldn't work at all.  At least ControllerMate worked in non-X applications, but since Wing is the only IDE that does locals completion well, using a non-X IDE temporarily was a non-starter.  Locals completion is nice with two hands, but absolutely essential with one.)  Note that this is more of an OS X issue than a problem with these pedals; apparently mapping pedals (x-keys or kinesis) to modifier keys works fine on windows.&lt;/p&gt;&lt;p&gt;So, the half-keyboard is not useful for programmers.  If it (a) were wireless and (b) had a non-skid backing -- it slid all over the place because the back side was just smooth plastic -- I could see it being useful for heavy smartphone users.  But it fails there too.  Good luck with this one, Matias.&lt;/p&gt;&lt;p&gt;Postscript: I considered trying the Frogpad as well as the half keyboard, but with users reporting that they got "&lt;a href="http://frogpad.zeroforum.com/zerothread?id=61"&gt;up to 20 wpm after 2 weeks&lt;/a&gt;," it didn't sound worth the trouble.  So if I ever had to spend another three weeks one handed I am not sure what is left to try.  Probably I would try to use ControllerMate (os x) or xmodmap (linux) to make make a "half keyboard" in software that didn't suck so much, &lt;a href="http://blag.xkcd.com/2007/08/14/mirrorboard-a-one-handed-keyboard-layout-for-the-lazy/"&gt;as suggested&lt;/a&gt; by one of the commenters in my first post.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3190191932308653995?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3190191932308653995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3190191932308653995' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3190191932308653995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3190191932308653995'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/06/brief-review-of-matias-half-keyboard-et.html' title='Brief review of the Matias Half Keyboard et al'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-1676190302773225744</id><published>2008-05-31T19:19:00.000-07:00</published><updated>2008-05-31T20:10:04.166-07:00</updated><title type='text'>One-handed typing?</title><content type='html'>&lt;p&gt;
I separated my right shoulder so that arm is going to be out of commission for a while.  (I am right-handed.)  I'm managing about 25 wpm with one hand, or about 1/4 my normal speed.  This is frustrating.  The &lt;a href="http://www.handykey.com/"&gt;Handkey Twiddler&lt;/a&gt; has been out of production for a while.  The &lt;a href="http://www.infogrip.com/product_view.asp?RecordNumber=12"&gt;BAT&lt;/a&gt; is not OS X compatible.  Anyone tried the &lt;a href="http://www.halfkeyboard.com/products/hkbinfo.html"&gt;Half Qwerty&lt;/a&gt; keyboard?  Are there other good options for under, say, $300?  (I found several very niche products for significantly more.)
&lt;p&gt;
I do plan to try voice recognition for email and IM but I can't see that working very well for code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-1676190302773225744?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/1676190302773225744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=1676190302773225744' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1676190302773225744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1676190302773225744'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/05/one-handed-typing.html' title='One-handed typing?'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-1275357674115082471</id><published>2008-05-19T22:22:00.000-07:00</published><updated>2008-05-20T06:34:40.799-07:00</updated><title type='text'>Jython Notes</title><content type='html'>I've been getting back into the Jython codebase this last week.  The last time I submitted a Jython patch was in the beginning of 2004, so it's been a while.  Things have changed...  Jython is finally requiring Java 5 for the next release, which means the usual improvements, but especially good use of annotations.

Here's some notes from my puttering around (mostly dragging Jython's set module up to compatibility with CPython 2.5's):

&lt;ul&gt;&lt;li&gt;Expect Eclipse to be slightly confused.  (Lots of "errors.")  This is normal.  Use ant to build.
&lt;/li&gt;&lt;li&gt;ant regrtest is handy.  run it before you start making changes so you know what's already broken in trunk.  (At least between releases, jython does not appear to be religious about "no tests shall fail."  But as a new developer you should make "no additional tests should fail" your motto.)
&lt;/li&gt;&lt;li&gt;Subjective impression: Jython re performace is a bit slow.  Jython uses its own re implementation predating the Java regular expressions in jdk 1.4.  But, the JRuby guys reported that the jdk implementation doesn't perform very well, so Jython hasn't been in a hurry to switch.  The JRuby solution was to port the oniguruma re engine from C to Java.  But, Ruby's strings are byte-based and mutable where Jython's are not, so using the JRuby engine isn't just a matter of dropping it in.  Also, these string differences may be a source of the poor performance the ruby people saw, so independant testing is in order here.&lt;/li&gt;&lt;li&gt;All of the Derived classes (PySetDerived, PyLongDerived, etc.) just exist to let python code subclass builtin types.  Those derived classes are generated by a .py script in src/templates&lt;/li&gt;&lt;li&gt;If you add a Java class that needs to be exposed to python using the @Expose annotations, you need to add the class name to CoreExposed.includes, or Jython will default to picking attributes via reflection and it usually guesses wrong.&lt;/li&gt;&lt;li&gt;Given a PyObject, you can (usually) easily instantiate another PyObject of the same class with pyobject.getType().__call__().  The only times this won't work is when your type's __new__ does something tricky, like how PyFrozenSet or PyTuple return a singleton for an empty frozenset or tuple.
&lt;/li&gt;&lt;/ul&gt;Thanks to all the people in #jython who helped me out, especially &lt;a href="http://dunderboss.blogspot.com/"&gt;Philip Jenvey&lt;/a&gt;!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-1275357674115082471?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/1275357674115082471/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=1275357674115082471' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1275357674115082471'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1275357674115082471'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/05/jython-notes.html' title='Jython Notes'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-468719660260808680</id><published>2008-05-16T05:33:00.000-07:00</published><updated>2008-05-16T05:40:03.266-07:00</updated><title type='text'>Quick tip for debugging with Jython</title><content type='html'>&lt;p&gt;
Currently, Jython ships with the &lt;a href="http://docs.python.org/lib/module-pdb.html"&gt;pdb&lt;/a&gt; debugger module from Python 2.3.  Unfortunately the 2.3 pdb is primitive even by command-line debugger standards.  (For instance, if the program you are debugging throws an exception, it will take pdb down with it.  Seriously.  Did anyone actually &lt;span style="font-style: italic;"&gt;use&lt;/span&gt; this thing?)
&lt;p&gt;
Fortunately all you have to do to get a much better experience is grab pdb.py, bdb.py, and cmd.py (for good measure) from a 2.5 CPython installation and run against that instead.
&lt;p&gt;
I've only tested this with Jython trunk but I think it should Just Work with the 2.2 release, too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-468719660260808680?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/468719660260808680/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=468719660260808680' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/468719660260808680'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/468719660260808680'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/05/quick-tip-for-debugging-with-jython.html' title='Quick tip for debugging with Jython'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2614622330185555019</id><published>2008-05-09T03:42:00.000-07:00</published><updated>2008-05-11T18:01:47.275-07:00</updated><title type='text'>IDE update</title><content type='html'>Last night the &lt;a href="http://utahpython.org/"&gt;Utah Python User Group&lt;/a&gt; held an editor/IDE smackdown.  I'm not going to write an exhaustive summary, but here are some highlights:
&lt;ul&gt;&lt;li&gt;ViM's OmniComplete is actually pretty decent.  Calltip support in the GUI is also good.  (GUI?  ViM?  Yeah, weird.)&lt;/li&gt;&lt;li&gt;Emacs completion, from Rope, is also good.  Emacs's refusal to make any concession to GUIs though keeps things clunky.  Not that it isn't great that Everything Works over plain ssh; that's fine, but going through classic Emacs buffers for docstrings or completion means everything takes more keystrokes than it should while being less useful than having that information Always On.
&lt;/li&gt;&lt;li&gt;Rope also gives Emacs refactoring support that works surprisingly well.&lt;/li&gt;&lt;li&gt;PyDev still sees a big win from the Eclipse platform.  Specifically, even though Subclipse and Subversive are a bit weak compared to the gold standard (that would be TortoiseSVN), they are much better than what you get with Komodo or Wing.  Now that I am on OS X (no Tortoise) this is a bigger issue for me than it used to be.&lt;/li&gt;&lt;li&gt;PyDev Extensions has refactoring support now, too.
&lt;/li&gt;&lt;li&gt;Komodo has limited support for completion inside django templates.  Which is impressive, since the commands allowed in django templates aren't really Python, which is to say that you can't just use the same completion support that you use for normal Python code.&lt;/li&gt;&lt;li&gt;Mako template support with completion, anyone?&lt;/li&gt;&lt;li&gt;The latest versions of Komodo and Wing both integrate unittest support.  Wing also supports doctest out of the box.  Meaning, you click a button, your tests run, you get a pretty summary with click-to-go-to-the-source-of-the-error support.  This might get me to finally upgrade to Wing 3.  It's not that "python test.py" is so hard, so much as I do it so often that even a little more convenience adds up.
&lt;/li&gt;&lt;/ul&gt;I was surprised how well ViM and Emacs do with Python now.  ViM's modern inline interface for code completion and Emacs's refactoring support are particularly nice.  The IDEs still win on the I part (Integration), in particular debugging and (for Eclipse at least) svn support.
&lt;p&gt;
Update: Ryan McGuire &lt;a href="http://www.enigmacurry.com/2008/05/09/emacs-as-a-powerful-python-ide/"&gt;blogged about his Emacs presentation&lt;/a&gt; in more detail.&lt;/p&gt;
&lt;p&gt;
Update 2: John Anderson &lt;a href=http://blog.sontek.net/2008/05/11/python-with-a-modular-ide-vim/&gt;blogged about setting up ViM&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2614622330185555019?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2614622330185555019/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2614622330185555019' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2614622330185555019'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2614622330185555019'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/05/ide-update.html' title='IDE update'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5118545770260148601</id><published>2008-04-11T00:43:00.000-07:00</published><updated>2008-04-11T00:58:04.083-07:00</updated><title type='text'>How to piss off your customers in two easy steps</title><content type='html'>&lt;ol&gt;
&lt;li&gt;Don't communicate with them
&lt;li&gt;Treat them like they owe you something
&lt;/ol&gt;

&lt;p&gt;
Google is off to a good (bad?) start with both of these in its management of the App Engine release.
&lt;p&gt;
Of the &lt;a href=http://code.google.com/p/googleappengine/issues/list&gt;120+ issues&lt;/a&gt; logged by beta testers, a few have been closed as wontfix or duplicate; most have no response at all from the App Engine team.  I can't think of any other company that I've filed an issue with that took that long to get back to me.  The good ones get back within hours.
&lt;p&gt;
The one exception I have seen is for &lt;a href=http://code.google.com/p/googleappengine/issues/detail?id=61&gt;the urllib issue&lt;/a&gt;, where gu...@python.org, presumably Guido, wrote

&lt;blockquote&gt;
Providing a urllib replacement implemented on top of urlfetch shouldn't be particularly hard.  If someone is willing to produce one, I'd be happy to review it and, if it passes muster, try to get it added.
&lt;/blockquote&gt;

&lt;p&gt;
Paraphrased: "maybe if you do our work for us we'll consider it."
&lt;p&gt;
WTF!
&lt;p&gt;
This isn't OSS, where "if you want something, do it yourself" is at least a semi-valid response.  App Engine developers are all currently beta testing a product that Google hopes to eventually charge for.  We're &lt;i&gt;doing google a favor&lt;/i&gt;.  (Context: the replacement Guido wants is a piece of code that will only ever be useful on app engine, and is something Google should have done in the first place instead of making urlfetch a public API.  This is &lt;i&gt;not&lt;/i&gt; code with a use case outside of App Engine.)
&lt;p&gt;
Maybe I'm over-sensitive, but this really rubs me the wrong way.
&lt;p&gt;
I hope Google can (a) put enough engineers on this that they can actually respond to issues, and maybe start closing some, and (b) remember that when you're selling a product, "why don't you fix it if it bothers you" is a poor response.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5118545770260148601?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5118545770260148601/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5118545770260148601' title='21 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5118545770260148601'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5118545770260148601'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/04/how-to-piss-off-your-customers-in-two.html' title='How to piss off your customers in two easy steps'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>21</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-497698385074282600</id><published>2008-04-10T23:39:00.000-07:00</published><updated>2009-08-11T06:19:00.621-07:00</updated><title type='text'>The business case for Google App Engine</title><content type='html'>&lt;p&gt;
App Engine sure has caused a &lt;a href=http://blogsearch.google.com/blogsearch?hl=en&amp;q=app+engine&amp;btnG=Search+Blogs&gt;stir&lt;/a&gt;.  Some of the competition is already &lt;a href=http://www.joyeur.com/2008/04/08/let-my-people-have-root&gt;scared&lt;/a&gt;, with reason.
&lt;p&gt;
But who is App Engine's real competition?
&lt;p&gt;
In a lot of ways, App Engine is in a class by itself.  It competes on the high end with &lt;a href=http://www.amazon.com/gp/browse.html?node=3435361&gt;Amazon Web Services&lt;/a&gt;.  But it also competes on the low end with every shared host out there.  And thanks to the integration of Google authentication and the application directory you could also make a case that in an orthogonal way it competes with Facebook's application API.
&lt;p&gt;
At the low end, App Engine is a big deal for Python developers and anyone else who is allergic to PHP.  Historically, you've really had to look hard for low end hosting that offered anything else.  And as everyone who has given products away to colleges knows, Free is a fantastic hook to get developers to try out your platform.  Once it's open for all, App Engine is going to become the preferred option for developers with the itch to write a &lt;a href=http://spyced.blogspot.com/2008/04/google-app-engine-return-of-unofficial.html&gt;toy &lt;/a&gt; or proof of concept and show it off to the world.
&lt;p&gt;
Less obviously (to developers, anyway), App Engine also a big deal for businesses that &lt;i&gt;aren't quite big enough&lt;/i&gt; to hire a sysadmin, or who are big enough but still prefer not to deal with that complexity.  (You thought hiring skilled developers is hard?  If anything, hiring skilled sysadmins is harder.)
&lt;p&gt;
I suspect there are a substantial number of companies in the uncomfortable situation of really needing more performance than shared hosting offers, but not wanting the complexity of taking the next step, to dedicated servers with dedicated sysadmins.
&lt;p&gt;
Of course, given App Engine's constraints, porting such applications to it is only going to be an option in a few cases.  The question is, are managers of new projects farsighted enough to see this problem coming and realize that app engine insures against it?
&lt;p&gt;
At the high end, AWS is the only real competition to App Engine, but as most observers have pointed out, they are different beasts.  AWS offers far more flexibility, at the cost of far more hours from your ops department.  (Although App Engine's datastore is &lt;a href=http://oakleafblog.blogspot.com/2008/04/comparing-google-app-engine-amazon.html&gt;a lot more sophisticated than the AWS SimpleDb&lt;/a&gt;, so the capabilities of AWS aren't a strict superset of App Engine's.)  Contrary to the Joyent assertion linked earlier, it isn't necessarily stupid to trade flexibility for convenience.  App Engine &lt;i&gt;just works&lt;/i&gt; to an unprecedented degree in the field of high-end scalability.
&lt;p&gt;
As with anything this disruptive, there's been a certain amount of hysteria.  Even &lt;a href=http://arstechnica.com/news.ars/post/20080408-analysis-google-app-engine-alluring-will-be-hard-to-escape.html&gt;people who should know better&lt;/a&gt; have repeated the idea that "nobody will want to acquire a product built on App Engine because you're locked in."  This is stupid.  Depending on a proprietary platform hasn't stopped products built on Oracle from being acquired, or  products using AWS, or even products built on a proprietary UNIX.  (Yes, those still exist.)  Nobody will care if you build on App Engine, except maybe Microsoft and Yahoo.  And even they can be pragmatic; Hotmail ran on BSD when Microsoft acquired them.
&lt;p&gt;
Lock-in &lt;i&gt;is&lt;/i&gt; a real issue, but not because App Engine will keep you from being acquired, and not because Google will screw you once they have you in their clutches -- that would scare off new customers and thus be bad business.  Lock-in is an issue &lt;i&gt;because evolving requirements might make App Engine's confines less of a good fit than it started out.&lt;/i&gt;  If you have to start adding servers at AWS or RackSpace to handle &lt;a href=http://code.google.com/p/googleappengine/issues/detail?id=6&gt;things you can't within App Engine&lt;/a&gt;, App Engine loses most of its value.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-497698385074282600?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/497698385074282600/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=497698385074282600' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/497698385074282600'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/497698385074282600'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/04/business-case-for-google-app-engine.html' title='The business case for Google App Engine'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6315461508996741197</id><published>2008-04-09T23:07:00.000-07:00</published><updated>2008-04-09T23:35:44.319-07:00</updated><title type='text'>Language popularity, App Engine - style</title><content type='html'>&lt;p&gt;
Just for fun, here's the number of stars (interested people) for the different language-support feature requests for Google App Engine:
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;Perl: 85
&lt;li&gt;Java: 69
&lt;li&gt;Ruby: 67
&lt;li&gt;PHP: 23
&lt;li&gt;C#: 11
&lt;li&gt;jvm, not just java: 7
&lt;li&gt;Common Lisp: 5
&lt;/ul&gt;
&lt;p&gt;
Update: &lt;a href=http://use.perl.org/article.pl?sid=08/04/10/0130201&amp;from=rss&gt;Perl is stuffing the ballot box&lt;/a&gt; :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6315461508996741197?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6315461508996741197/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6315461508996741197' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6315461508996741197'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6315461508996741197'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/04/language-popularity-app-engine-style.html' title='Language popularity, App Engine - style'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5921938646731780929</id><published>2008-04-09T21:53:00.000-07:00</published><updated>2008-04-09T23:04:44.294-07:00</updated><title type='text'>Google App Engine: Return of the Unofficial Python Job Board Feed</title><content type='html'>&lt;p&gt;
&lt;a href=http://groups.google.com/group/comp.lang.python.announce/msg/54adec3d6315772d&gt;Over three years ago&lt;/a&gt; (!), I wrote a screen scraper to turn the &lt;a href=http://www.python.org/community/jobs/&gt;Python Job Board&lt;/a&gt; into an RSS feed.  It didn't make it across one of several server moves since then, but now I've ported it to Google's App Engine: &lt;a href=http://pyjobs.appspot.com/&gt;the new unofficial python job board feed&lt;/a&gt;.
&lt;p&gt;
I'll be making a separate post on the Google App Engine business model and when it makes sense to consider the App Engine for a product.  Here I'm going to talk about my technical impressions.
&lt;p&gt;
First, here's &lt;a href=http://pyjobs.appspot.com/static/jobs.py.txt&gt;the source&lt;/a&gt;.  Nothing fancy.  The only App Engine-specific API used is urlfetch.
&lt;p&gt;
Unfortunately, even something this simple bumps up against some pretty rough edges in App Engine.  It's going to be a while before this is ready for production use.
&lt;p&gt;
The big one is &lt;a href=http://code.google.com/p/googleappengine/issues/detail?id=6&gt;scheduled background tasks&lt;/a&gt;.  (If you think this is important, &lt;i&gt;star the issue&lt;/i&gt; rather than posting a "me too" comment.)  Related is &lt;a href=http://code.google.com/p/googleappengine/issues/detail?id=109&gt;a task queue&lt;/a&gt; that would allow those scheduled tasks to easily be split into bite-size pieces, which is important for Google to allow scheduled tasks (a) without worrying about runaway processes while (b) still accomplishing an arbitrary amount of work.
&lt;p&gt;
If there were a scheduled task api, my feed generator could poll the python jobs site hourly or so, and store the results in the Datastore, instead of having a 1:1 ratio of feed requests to remote fetches.
&lt;p&gt;
While you can certainly create a cron job to fetch a certain url of your app periodically, and have that url run your "scheduled task," things get tricky quickly if your task needs to perform more work than it can accomplish in the small per-page time allocation it gets.  Fortunately, I expect a scheduled task api from App Engine sooner rather than later -- Google wants to be your one stop shop, and for a large set of applications (every web app I have ever seen has had &lt;i&gt;some&lt;/i&gt; scheduled task component) to have to rely on an external server to ping the app with this sort of workaround defeats that purpose completely.
&lt;p&gt;
Another rough edge is in &lt;a href=http://code.google.com/appengine/docs/urlfetch/overview.html&gt;the fetch api&lt;/a&gt;.  Backend fetches like mine need &lt;a href=http://code.google.com/p/googleappengine/issues/detail?id=110&gt;a callback api&lt;/a&gt; so that a slow remote server doesn't cause the fetch to fail forever from being auto-cancelled prematurely.  Of course, this won't be useful until scheduled tasks are available.  I'm thinking ahead. :)
&lt;p&gt;
Finally, be aware that &lt;a href=http://code.google.com/p/googleappengine/issues/detail?id=111&gt;fatal errors are not logged by default&lt;/a&gt;.  If you want to log fatal errors, you need to do it yourself.  &lt;a href=http://code.google.com/appengine/docs/python/requestsandappcaching.html&gt;the main() function&lt;/a&gt; is a good place for this if you are rolling your own simple script like I am here.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5921938646731780929?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5921938646731780929/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5921938646731780929' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5921938646731780929'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5921938646731780929'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/04/google-app-engine-return-of-unofficial.html' title='Google App Engine: Return of the Unofficial Python Job Board Feed'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-1883935275665407902</id><published>2008-04-06T22:14:00.000-07:00</published><updated>2009-07-20T08:35:48.266-07:00</updated><title type='text'>My half-baked thoughts on Python web frameworks</title><content type='html'>&lt;p&gt;
I have been lucky to be able to fill our recent open positions with people who know Python as well as Java so now we are up to half the (6 person) company in that category and preferring Python, and 2 of the others have played with Python and liked it at least well enough to not object.  So the boss has conceded that it makes sense to go the Python route for our next project.

&lt;/p&gt;&lt;p&gt;
We're going to be doing a web, "next gen" version of our existing client-server project, which is mostly simple CRUD but does have 1000+ tables in its current incarnation.  So we really need something that can autogenerate 90+% of the CRUD or we will go insane.

&lt;/p&gt;&lt;p&gt;
The trouble is, I still don't really like any of the Python web options 100%.  (I like the web options in other languages less, but I'm a perfectionist.)

&lt;/p&gt;&lt;p&gt;
Django is well documented, its admin app is something everyone else envies, and newforms looks decent, but the ORM blows and I'm not fond of the template engine either.   (Pre-emptive pedantry: yes, I know I can "import sqlalchemy."  Please stop saying that like it means something; I'm not interested in defining models twice -- once for real work with SA, and once for interop with the rest of django.)  Apparently &lt;a href="http://code.google.com/p/django-sqlalchemy/"&gt;django-sqlalchemy&lt;/a&gt; got far enough in PyCon sprints that it's kinda usable so working on that would be an option.  Of course even then there is no guarantee the django core would accept it into mainline, and maintaining it as a "vendor branch" would proably suck.  If django used a &lt;a href="http://en.wikipedia.org/wiki/Distributed_revision_control"&gt;dscm&lt;/a&gt; like &lt;a href="http://www.selenic.com/mercurial/wiki/"&gt;Mercurial&lt;/a&gt; I might be willing to do that, but svn is just too painful so that is a real risk.

&lt;/p&gt;&lt;p&gt;
I don't see a way to generate a page containing just a CRUD interface for table X with the django admin app.  The admin app really is a monolithic application, not something you can easily re-use pieces of.

&lt;/p&gt;&lt;p&gt;
Regexps suck for url mapping.

&lt;/p&gt;&lt;p&gt;
Pylons is &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; well documented and after keeping an eye on this for something like 18 months I don't think this is a problem that will be solved, for whatever reasons. On the other hand, SA + mako is a very sane default, and both of those &lt;span style="font-style: italic;"&gt;are&lt;/span&gt; well documented so it's really only core Pylons that suffers from doc crapitude, and core Pylons is fairly small.  IRC responsiveness mitigates this further.

&lt;/p&gt;&lt;p&gt;
Pylons &lt;span style="font-style: italic;"&gt;still&lt;/span&gt; doesn't have a good CRUD (or even high-level manual form generation) solution, which has bugged me for even longer than the docs.  I can't fathom how people can tolerate writing this kind of boilerplate in 2008.  &lt;a href="http://code.google.com/p/formalchemy/"&gt;Formalchemy&lt;/a&gt; gets about 30% of the way there.  &lt;a href="http://code.google.com/p/dbsprockets/wiki/DBMechanic"&gt;DBMechanic&lt;/a&gt; requires TG2 atm, although apparently hacking it to run on Pylons may not be too much effort; I would guess around 20% of the effort to get the django-sa project really usable.

&lt;/p&gt;&lt;p&gt;
&lt;a href="http://docs.turbogears.org/2.0"&gt;TG2&lt;/a&gt; is of course very bleeding edge and although I like genshi's syntax in theory, in practice XML templates irritate the hell out of me.  (Very verbose, xinclude sucks compared to "inheritance," and incorporating rich dynamic content -- i.e., user-generated, like forum posts, that needs to include html tags -- is a PITA.  Not to mention that having to write "a &amp;amp;gt; b" when you mean "a &amp;gt; b" bugs me all out of proportion to the actual inconvenience it inflicts on me.)  Still, better than the django templates.

&lt;/p&gt;&lt;p&gt;
I'm skeptical that TG2 is a big enough value add to want to add it (in its unfinished state) as a dependency vs rolling our own on Pylons.  But DBMechanic does look like it could be exactly what I want in a CRUD generator.

&lt;/p&gt;&lt;p&gt;
web.py seems like more of a tech demo than a real product.  I don't see any signs of a CRUD or form generator.  reddit, probably the largest web.py site at least in terms of page views, moved to Pylons.

&lt;/p&gt;&lt;p&gt;
Zope 3 is alone in being really production ready &lt;span style="font-style: italic;"&gt;without running from svn&lt;/span&gt;.  &lt;a href="http://grok.zope.org/"&gt;Grok&lt;/a&gt; does do a good job of smashing zcml and &lt;a href="http://svn.zope.org/z3c.form/trunk/src/z3c/form/form.txt?view=auto"&gt;z3c.form&lt;/a&gt; looks okay but lives up to the Zope reputation of complexity.  (Field managers, widget managers -- are these the same things? -- widget modes, ...)  AFAIK relational dbs are still second-class citizens in zope, and with all due respect to zodb it is no postgresql.  OTOH there is z3c.sqlalchemy which gives me hope.  Finally: you have to &lt;span style="font-style: italic;"&gt;manually restart zope&lt;/span&gt; (per the Grok tutorial) after changing your .py files?  Seriously?

&lt;/p&gt;&lt;p&gt;
Bottom line, Zope might actually be a decent option if we had a Zope expert on staff but we do not and I am not willing to tackle the learning curve alone.

&lt;/p&gt;&lt;p&gt;
Nevow: form handling is in flux.  The new hotness is "pollenation forms," but that is svn-only and the api "will probably change."

&lt;/p&gt;&lt;p&gt;
Zope and Nevow both have their own xml-based templates predecessing but similar to genshi.  Something like Nevow's &lt;a href="http://www.kieranholland.com/code/documentation/nevow-stan/"&gt;Stan&lt;/a&gt; is obviously useful for programmatic template generation but it's not yet clear if that's going to be something we need.  Probably only if we have to write our own form generator.  If so, I suspect ripping a standalone Stan out of Nevow would be straightforward.

&lt;/p&gt;&lt;p&gt;
(Spyce of course never really got any traction to speak of.  It's time for me to let it go quietly into the night and leverage someone else's framework.)

&lt;/p&gt;&lt;p&gt;
Conclusion: I think porting DBMechanic to Pylons is our best option.  DBMechanic seems designed to be more flexible than the django admin app.  Django would be my second choice.

&lt;/p&gt;&lt;p&gt;
Corrections?  Thoughts?&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-1883935275665407902?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/1883935275665407902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=1883935275665407902' title='42 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1883935275665407902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1883935275665407902'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/04/m-half-baked-thoughts-on-python-web.html' title='My half-baked thoughts on Python web frameworks'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>42</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4307931241319903593</id><published>2008-04-02T09:05:00.000-07:00</published><updated>2009-01-19T05:19:25.676-08:00</updated><title type='text'>Real Python IDEs</title><content type='html'>&lt;p&gt;
After reading a blog post titled "The Abysmal State of Python IDEs" (which I won't link to because it's minformative, but it's easy to google by title), I wondered how the author managed to pick such a lousy group of IDEs to try.  He tried "ActiveState" (does he mean PythonWin?), DrPython, SPE, and ScrIDE, only one of which is in the top 10 google hits for Python IDE. 
&lt;/p&gt;&lt;p&gt;
The google top 10 include Eric, Wing IDE, Radio Userland, SPE, PyDev, and Komodo.  The Yahoo and MSN top 10s are similar.  Except for Radio Userland, this is a much better group to start with, and one that in fact does include what I think are the only 3 Python IDEs worth trying.
&lt;/p&gt;&lt;p&gt;
So how does a newbie end up picking such a lousy group of IDEs to try?  The only likely possibility seems to be that he went to the top google hit, the python.org wiki page.  Or possibly he went off of the top MSN hit, the c2 wiki Python IDE page.  Both are (rather, were) heaping wads of products that mostly weren't IDEs at all, or were IDEs for other languages that happened to include Python syntax coloring.
&lt;/p&gt;&lt;p&gt;
Syntax coloring and maybe a Run button doesn't qualify you as a Python IDE in 2008, guys.  (Sorry, IDLE.)  &lt;i&gt;Integrated&lt;/i&gt; means you need to &lt;i&gt;integrate&lt;/i&gt; something nontrivial, preferably a debugger, although gui builders can also count.
&lt;/p&gt;&lt;p&gt;
So I organized &lt;a href="http://wiki.python.org/moin/IntegratedDevelopmentEnvironments"&gt;the python.org IDE page&lt;/a&gt; by feature set and moved the non-IDEs to the Editors page, even if a pedant would note that they &lt;i&gt;were&lt;/i&gt; IDEs, just not really for Python.  That's not what 99.9% of people are looking for when they go to a Python IDE page, so let's be useful rather than pedantic.  I also elided the non-IDEs from &lt;a href="http://c2.com/cgi/wiki?PythonIde"&gt;the c2 page&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4307931241319903593?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4307931241319903593/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4307931241319903593' title='26 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4307931241319903593'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4307931241319903593'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/04/python-ides.html' title='Real Python IDEs'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>26</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-1055393419415860764</id><published>2008-03-15T03:57:00.000-07:00</published><updated>2008-03-15T04:06:43.231-07:00</updated><title type='text'>Best new blog I discovered at PyCon [so far]</title><content type='html'>&lt;p&gt;
I was talking to Adam Gomaa on Thursday when &lt;a href="http://groovie.org/"&gt;Ben Bangert&lt;/a&gt; stopped by us and told him he had &lt;a href=http://adam.gomaa.us/blog/&gt;an interesting blog.&lt;/a&gt;  "If Ben says you have a good blog, I'll have to check it out," I told Adam.  "That's not what I said," Ben corrected me.  "I said &lt;span style="font-style:italic;"&gt;interesting&lt;/span&gt;."  But it is good, and I'm glad I found it.

&lt;p&gt;
And regarding Adam's post on &lt;a href=http://adam.gomaa.us/blog/the-django-orm-problem/&gt;declarative layers for SQLAlchemy&lt;/a&gt;, check out the &lt;a href=http://www.sqlalchemy.org/docs/04/plugins.html&gt;new-in-SA 0.4.4 declarative plugin.&lt;/a&gt;  It's almost exactly what Adam was looking for -- a little more verbose, in keeping with the "explicit is better than implicit" Python philosophy that SA shares, but creating your own superclass that creates a PK named "id" by default is just a few lines of code if that's what you prefer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-1055393419415860764?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/1055393419415860764/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=1055393419415860764' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1055393419415860764'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1055393419415860764'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/03/best-new-blog-i-discovered-at-pycon-so.html' title='Best new blog I discovered at PyCon [so far]'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4683003968677987479</id><published>2008-03-15T03:29:00.000-07:00</published><updated>2008-03-15T03:46:52.630-07:00</updated><title type='text'>PyCon, Saturday and Sunday</title><content type='html'>&lt;p&gt;
Saturday I'll be at the SQLAlchemy and State of PyPy talks.  Then the board game BOF in the evening.  In between, probably mostly the "hallway track."
&lt;p&gt;

Sunday I plan to attend "What Zope did wrong" and "Core Python Containers."  (I'd also like to see the Wingware presentation in the 11:35 slot, but since I can only pick one, I guess I can just cross my fingers that this year's video recordings actually get published somewhere.)
&lt;p&gt;

Feel free to stop me and introduce yourself.
&lt;p&gt;

(If we met at a previous PyCon, I've changed my hair around a lot from year to year.  The photo on this blog represents what I look like now.  I should probably stick with this for a couple years so people can recognize me.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4683003968677987479?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4683003968677987479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4683003968677987479' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4683003968677987479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4683003968677987479'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/03/pycon-saturday-and-sunday.html' title='PyCon, Saturday and Sunday'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7366289624492417427</id><published>2008-03-13T17:51:00.000-07:00</published><updated>2008-03-13T18:00:16.476-07:00</updated><title type='text'>Introverting</title><content type='html'>&lt;p&gt;I'm taking a break during the evening tutorial in my hotel room on the second floor where I can enjoy the pycon wireless signal (which seems to be working quite well now).  &lt;a href=http://www.theatlantic.com/doc/200303/rauch&gt;My name is Jonathan, and I am an introvert [too].&lt;/a&gt;  I suspect a lot of PyCon attendees can empathize.
&lt;p&gt;
After the last tutorial session ends at 9:30, I'm planning to head down with my copy of &lt;a href=http://www.sjgames.com/munchkin/game/&gt;Munchkin&lt;/a&gt; and see if anyone wants to start the &lt;a href=http://wiki.python.org/moin//PyCon2008/BoardGameEvent&gt;board game social&lt;/a&gt; a day early.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7366289624492417427?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7366289624492417427/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7366289624492417427' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7366289624492417427'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7366289624492417427'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/03/introverting.html' title='Introverting'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6987166300497333973</id><published>2008-03-13T13:11:00.000-07:00</published><updated>2008-03-13T17:46:05.132-07:00</updated><title type='text'>Slides from Introduction to SQLAlchemy tutorial</title><content type='html'>&lt;p&gt;
My slides from this morning are up: &lt;a href=http://utahpython.org/jellis/sa-intro.pdf&gt;http://utahpython.org/jellis/sa-intro.pdf&lt;/a&gt;.  The about 1/3 of the class did not have SA installed yet, and the network was down.  Fortunately, Mike and Jason brought 5 flash drives and by the time we got to the first exercise everyone was up and running.

&lt;p&gt;
This was my third time doing a three-hour SQLAlchemy tutorial.  Differences from &lt;a href=http://spyced.blogspot.com/2007/07/final-version-of-oscon-sqlalchemy.html&gt;(last time)&lt;/a&gt; include
&lt;ul&gt;
&lt;li&gt;updated for the 0.4 series
&lt;li&gt;removed almost all the SQL-layer material
&lt;li&gt;added a section on the new relation filtering api
&lt;li&gt;Improved the parts of the Fundamentals sections that were poorly explained
&lt;li&gt;added a short section on the new-in-0.4 transaction management.
&lt;/ul&gt;

&lt;p&gt;
There wasn't a wall clock in the tutorial room, so despite making an effort to be aware of time I went 10 minutes over.  Sorry, guys. :)

&lt;p&gt;
&lt;a href=http://blog.discorporate.us/&gt;Jason Kirtland&lt;/a&gt; will be posting the slides from the Advanced SQLAlchemy tutorial soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6987166300497333973?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6987166300497333973/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6987166300497333973' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6987166300497333973'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6987166300497333973'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/03/slides-from-introduction-to-sqlalchemy.html' title='Slides from Introduction to SQLAlchemy tutorial'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7949380612925981123</id><published>2008-03-07T15:23:00.000-08:00</published><updated>2008-03-07T16:02:52.288-08:00</updated><title type='text'>Pylons: first impressions</title><content type='html'>A couple co-workers and I spent some time with Pylons yesterday, enough to get to where we started to feel productive, but not much more than that.  I think there's value in a newbie's first impressions, so here are mine.  I'm sure at least some of these are wrong.
&lt;ol&gt;&lt;li&gt;Poor documentation of core Pylons (Mako and SQLAlchemy are fine -- thanks, &lt;a href="http://techspot.zzzeek.org/"&gt;Mike&lt;/a&gt;).  I had to use the source several times.  I'm still not really sure how Routes works, although I was mostly able to make it do what I wanted.  The first tutorial overcomplicated things, showing how to configure things to handle semi-obscure requirements, without explaining those requirements or simpler alternatives.&lt;/li&gt;&lt;li&gt;Helpful community.  I got most of the answers I needed pretty quickly in the #pylons freenode IRC channel.
&lt;/li&gt;&lt;li&gt;Not much black magic: if you know Python you won't be struggling with weird Pylons-only concepts.  It's all modules, classes, and dicts put together in an intuitive way (at least to my way of thinking).&lt;/li&gt;&lt;li&gt;SA (SQLAlchemy) is an amazing pleasure to use.  (Okay, not just in Pylons, but I had to say it.)  I have a slightly unusual schema -- the details are outside my scope here -- and SA's autoload handled it perfectly.&lt;/li&gt;&lt;li&gt;I wrote more CRUD boilerplate than I would have liked.  There is no real alternative to Django's "admin app."  &lt;a href="http://code.google.com/p/dbsprockets/wiki/DBMechanic"&gt;DBMechanic&lt;/a&gt; looks like it's getting close, but it's TG-and-Genshi only for now.  &lt;a href="http://code.google.com/p/formalchemy/"&gt;FormAlchemy&lt;/a&gt; is a partial solution (I did use it) but only does html generation; you'll still write boilerplate in your controllers.&lt;/li&gt;&lt;li&gt;Genshi appeals to me in theory but in practice its XML nature makes it feel clunky.  &lt;a href="http://www.w3.org/TR/xinclude/"&gt;XInclude&lt;/a&gt; as an alternative to template inheritance?  3 lines of xmlns per template?  Mako has its own verbosity problems, e.g., having to do a def to pass a title to the parent template, but these aren't inherent to Mako's approach the way they are to Genshi's (we're XML, dammit), and Mike seems mildy interested in improving this specific example for the next release.&lt;/li&gt;&lt;li&gt;The &lt;a href="http://wiki.developers.facebook.com/index.php/PythonPyFacebookTutorial"&gt;pyfacebook&lt;/a&gt; tutorial is long on throwing wads of code at you and short on explaining what's actually going on.  What does facebook.check_session() do?  What does the facebook_middleware do?  Why?  Most facebook api tutorials have this same problem.  Obviously I haven't written a better one, so call me a hypocrite, but tutorial authors, please &lt;a href="http://en.wikipedia.org/wiki/Cargo_cult_programming"&gt;explain the why and not just the what&lt;/a&gt;.
&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7949380612925981123?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7949380612925981123/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7949380612925981123' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7949380612925981123'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7949380612925981123'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/03/pylons-first-impressions.html' title='Pylons: first impressions'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-420693653599039496</id><published>2008-02-09T08:21:00.000-08:00</published><updated>2008-02-09T08:32:29.718-08:00</updated><title type='text'>SQLAlchemy at Pycon 08</title><content type='html'>SQLAlchemy will be well-represented this year with two tutorials and a talk.
&lt;p&gt;
I'll be the primary instructor for the &lt;a href=http://us.pycon.org/2008/tutorials/SQLAlechemyIEllis/&gt;Introduction to SQLAlchemy&lt;/a&gt; tutorial.  I just updated the pycon page with the outline of what we'll cover.  The slides will be pretty similar to &lt;a href=http://spyced.blogspot.com/2007/02/pycon-sqlalchemy-tutorial-slides.html&gt;last time&lt;/a&gt;, only with more time spent on a high-level intro to ORM (object-relational mapping) for people who have little exposure to that.  And of course last year 0.4 was not out.
&lt;p&gt;
The &lt;a href=http://www.sqlalchemy.org/docs/04/index.html&gt;SQLAlchemy documentation&lt;/a&gt; is thorough but a little intimidating.  IMNSHO, the introduction tutorial is a great way to pick up the basics and get some practice, after which everything starts to make a lot more sense.
&lt;p&gt;
Mike Bayer, the author of SA, will be the primary instructor for the &lt;a href=http://us.pycon.org/2008/tutorials/SQLAlechemyIIEllis/&gt;Advanced SQLAlchemy&lt;/a&gt; tutorial.  Jason Kirtland, one of the most prolific SA hackers besides Mike himself, will also be teaching.
&lt;p&gt;
At the conference itself, Mike will be presenting &lt;span style="font-style:italic;"&gt;Sqlalchemy 0.4 and beyond&lt;/span&gt;.  To save you digging it out of the talks page, here's the summary:
&lt;blockquote&gt;
At last year's Pycon, we introduced SQLAlchemy, the Database Toolkit for Python. This year, SQLAlchemy has gained new developers, a lot more users, and has now produced SQLAlchemy 0.4. The latest series of SQLAlchemy is significantly improved from the previous, in that APIs have been greatly pared down and refined, performance has been stepped up 30-40%, and ongoing architectural and developmental improvements have made room for lots of great new features with more to come. This talk intends to describe what's new in the 0.4 series, both for current users as well as for folks who may have only had experience with our earlier versions.
&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-420693653599039496?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/420693653599039496/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=420693653599039496' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/420693653599039496'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/420693653599039496'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/02/sqlalchemy-at-pycon-08.html' title='SQLAlchemy at Pycon 08'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8314701810503863186</id><published>2008-01-14T21:08:00.000-08:00</published><updated>2008-01-14T21:23:34.389-08:00</updated><title type='text'>Why IE rejects your cookies for no apparent reason</title><content type='html'>&lt;p&gt;
Seriously, &lt;a href=http://support.microsoft.com/default.aspx/kb/323752&gt;WTF&lt;/a&gt;.
&lt;p&gt;
I'll summarize for those of you who are allergic to MSN knowledge base articles, although this one is fairly to-the-point:
&lt;blockquote&gt;
If you implement a FRAMESET whose FRAMEs point to other Web sites on the networks of your partners or inside your network, but you use different top-level domain names... IE silently rejects cookies sent from third party sites.
&lt;/blockquote&gt;
&lt;p&gt;
This bit me today while adding facebook support to &lt;a href=http://carnageblender.com&gt;my text-based game&lt;/a&gt; -- I'm going the IFRAME route for fb support rather than rewrite the whole app in FBML thankyouverymuch, and yes, apparently IFRAME counts too for IE retard-mode.
&lt;p&gt;
What makes me cry a little inside is not the two hours spent deep in old and crufty login and cookie-setting &lt;a href=http://openacs.org&gt;legacy code&lt;/a&gt; wondering what the flaming hell was going on.  No, what makes me cry is that I got screwed by a setting that &lt;span style="font-style:italic;"&gt;will never block the bad guys,&lt;/span&gt; because labeling yourself a good guy is &lt;span style="font-style:italic;"&gt;entirely voluntary&lt;/span&gt;.  It's like someone at MS read &lt;a href=http://www.faqs.org/rfcs/rfc3514.html&gt;the evil bit RFC&lt;/a&gt; and took it seriously.
&lt;p&gt;
The mind boggles.
&lt;p&gt;
In the meantime, if you know where your web framework's cookie code lives, do everyone a favor and patch it now to add that P3P header given in the knowledge base by default.  And an option to disable it if you're obsessive-compulsive that way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8314701810503863186?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8314701810503863186/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8314701810503863186' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8314701810503863186'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8314701810503863186'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2008/01/why-ie-rejects-your-cookies-for-no.html' title='Why IE rejects your cookies for no apparent reason'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-3466050888939463766</id><published>2007-12-30T16:04:00.000-08:00</published><updated>2011-09-01T06:22:32.565-07:00</updated><title type='text'>Scala: first impressions</title><content type='html'>&lt;p&gt;
I'm reading the &lt;a href="http://www.artima.com/shop/forsale"&gt;prerelease of the Scala book&lt;/a&gt;, since I'm working for a heavily Java-invested organization now and programming in Java feels like running a race with cement shoes.  Politically, Jython doesn't seem like an option; Scala might be an easier sell.

&lt;/p&gt;&lt;p&gt;
Here's some of my impressions going through the book.

&lt;/p&gt;
&lt;h3&gt;Scala is a decent scripting language&lt;/h3&gt;

[updated this section thanks to comments from anonymous, Jörn, and Eric.]
&lt;p&gt;
Here's how you can loop through each line in a file:

Python:
&lt;pre class="code"&gt;
import sys
for line in open(sys.argv[0]):
   print line
&lt;/pre&gt;

Scala:
&lt;pre class="code"&gt;
import scala.io.Source
Source.fromFile(args(0)).getLines.foreach(println)
&lt;/pre&gt;

&lt;p&gt;
The scala interpreter also has a &lt;a href="http://en.wikipedia.org/wiki/REPL"&gt;REPL&lt;/a&gt;, which is nice for experimenting.

&lt;p&gt;
&lt;a href="http://www.artima.com/scalazine/articles/steps.html"&gt;This article&lt;/a&gt; has more examples of basic scripting in scala.

&lt;h3&gt;Getters and setters&lt;/h3&gt;

One thing I despise in Java is the huge amount of wasted lines of code dedicated to getFoo and setFoo methods.  Sure, your IDE can autogenerate these for you, but it still takes up lines in your editor and effort to mentally block them out when examining an unfamiliar class to determine what it does.

&lt;p&gt;
C# has Python-style properties, so in theory could be virtually free of this kind of boilerplate, since there is &lt;span style="font-style: italic;"&gt;no syntactic difference&lt;/span&gt; between "foo.x = y" whether x is a raw field or a property.  So, the right thing to do, which you'll see in Python code, is using a raw public field &lt;span style="font-style: italic;"&gt;until you actually need extra logic&lt;/span&gt;, at which point you replace it with a property and nobody's code breaks.
&lt;/p&gt;&lt;p&gt;
But C# wasn't far enough removed from Java culturally so everyone writes boilerplate properties instead of boilerplate getters and setters.  (I'm aware that in a compiled language like C# it makes sense to start with properties for classes in certain kinds of libraries but these make up a vanishingly small number of actual classes in the wild.)
&lt;/p&gt;&lt;p&gt;
This is a long way of saying that Scala's properties make a lot of sense given the audience they are targetting (Java programmers).  Scala vars are &lt;span style="font-style: italic;"&gt;automatically&lt;/span&gt; turned into properties (or getters and setters, if you prefer to think in terms of those).  So even the most obstinate fan of boilerplate code has &lt;span style="font-style: italic;"&gt;absolutely no reason&lt;/span&gt; to keep writing unnecessary getters and setters. 

&lt;/p&gt;&lt;p&gt;
If you need to manually tweak your properties later, you can magically define a setter for "field" by naming a method "field_=".  This seems a bit like a one-off hack rather than part of a well-designed system, but I can live with it.  (Since parentheses are optional for a scala method taking no arguments, any no-argument method call is already syntactically indistiguishable from a raw field read.)

&lt;/p&gt;&lt;h3&gt;Scala doesn't know what it wants to be when it grows up, yet&lt;/h3&gt;

Scala contains a lot of features from a lot of different influences, which means there's often competing styles to use; the young scala community hasn't yet decided which to emphasize.  For instance, iteration vs traversal -- "for (item &amp;lt;- items)" vs "items.foreach" -- or even whether to leave out optional (inferred) type declarations.

&lt;p&gt;
Perhaps this is merely a failure of one book trying too hard to make scala be all things to all readers.

&lt;/p&gt;&lt;h3&gt;Random thoughts&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scala needs something like Python's enumerate function; if you want to loop each object in a collection but you need its index too, you have to do a manual for loop.
&lt;/li&gt;&lt;li&gt;Using parenthesis for collection access ("array(0); map("foo"))") makes it unnecessarily difficult to tell if you're looking at a method call or a collection access.
&lt;/li&gt;&lt;li&gt;The &lt;a href="http://www.scala-lang.org/docu/files/api/index.html"&gt;scala stdlib&lt;/a&gt; is not very google-able yet; if you don't know exactly the class you are looking for ahead of time, you probably won't find it.  For example, when looking for a scala object that could iterate through lines in a file, I correctly guessed the scala.io package, but would never have bothered looking at Source until more googling turned it up in a &lt;a href="http://blog.huikau.com/2007/11/25/simple-file-io-in-different-dynamic-languages/"&gt;blog entry&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;h3&gt;Conclusion and disclaimer&lt;/h3&gt;

This is long enough; I'll probably write more as I go through more of the book.  Patterns and implicit conversions are particularly interesting.
&lt;p&gt;
I was attracted to scala while looking for a sane alternative to Java on the JVM and it looks like scala might "drag Java programmers halfway to Python," to paraphrase Guy Steele.  And I'm happy to be corrected on anything I've written here.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-3466050888939463766?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/3466050888939463766/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=3466050888939463766' title='21 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3466050888939463766'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/3466050888939463766'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/12/scala-first-impressions.html' title='Scala: first impressions'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>21</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8208589984480456508</id><published>2007-12-28T19:41:00.000-08:00</published><updated>2009-02-21T07:29:40.249-08:00</updated><title type='text'>Troubleshooting the ps3 wireless network connection, including error 80130128</title><content type='html'>My father got a ps3 for Christmas, but ran into some problems getting it on his wireless network.

The first one was "connection error 80130128" after configuring it to use DHCP.
I couldn't google anything useful about this; just a few other hapless victims asking if anyone had any ideas.  Fortunately Dad had his laptop there too and noticed Windows complaining that two machines on the network were both using the same IP.  So, over the phone, I walked him through setting up the ps3 with a static address:
&lt;ol&gt;&lt;li&gt;on his laptop, run -&gt; cmd&lt;/li&gt;&lt;li&gt;ipconfig&lt;/li&gt;&lt;li&gt;Read the "gateway" ip.  Put that into his browser to go to his router's admin page&lt;/li&gt;&lt;li&gt;Find the DHCP settings for his router to see what range of IPs it hands out; pick one outside that range&lt;/li&gt;&lt;li&gt;Set up the ps3 with that IP, the router IP as primary dns, and an opendns server as secondary
&lt;/li&gt;&lt;/ol&gt;This made the connection test happy.  But when he tried to go to the playstation store, it gave a DNS error.  If he repeated the connection test again, it failed too.

"Well," I told him, "It's supposed to try both DNS servers.  But we can try setting the primary DNS server to opendns as well."  Once he did that, everything worked.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8208589984480456508?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8208589984480456508/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8208589984480456508' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8208589984480456508'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8208589984480456508'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/12/troubleshooting-ps3-wireless-network.html' title='Troubleshooting the ps3 wireless network connection, including error 80130128'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7100540298112066875</id><published>2007-11-16T16:32:00.000-08:00</published><updated>2007-11-16T19:54:26.209-08:00</updated><title type='text'>Reed-Solomon libraries</title><content type='html'>If you want to run a &lt;a href=http://mozy.com/&gt;multi-petabyte storage system&lt;/a&gt; then you don't want to do it with Raid 5 or &lt;a href=http://carbonite.com&gt;Raid 6&lt;/a&gt;; with modern disks' &lt;a href=http://www.usenix.org/event/fast07/tech/schroeder.html&gt;~3% per year failure rate&lt;/a&gt;, that's 300 a year when you have 10000 disks and the odds start to get pretty good (relatively speaking) that you'll face permanent data loss at some point when you lose a third disk from an array while two are rebuilding.  And of course monitoring and replacing disks in lots of small arrays is manpower-intensive, which to investors translates as "expensive."
&lt;p&gt;
You probably don't want to go with &lt;a href=http://google.com&gt;triplication&lt;/a&gt;, either; disks are cheap, but not so cheap that you want to triple your hardware costs unnecessarily.  While storing multiple copies of frequently used data is good, &lt;i&gt;all&lt;/i&gt; your data probably isn't "frequently used."
&lt;p&gt;
What is the solution?  As it turns out, Raid is actually a special case of &lt;a href=http://en.wikipedia.org/wiki/Reed-Solomon_error_correction&gt;Reed-Solomon encoding&lt;/a&gt;, which lets you specify any degree of redundancy you want.  You can be safer than triplication with a fraction of the space needed.
&lt;p&gt;
I was prompted to write this because Mozy open-sourced the Reed-Solomon library I used while I was there, &lt;a href="http://sourceforge.net/projects/librs"&gt;librs&lt;/a&gt;, complete with Python bindings.  The original librs we used at Mozy was written by &lt;a href=http://theclarkfamily.name/blog/&gt;Byron Clark&lt;/a&gt;, a formidible task.  Later we switched to the version you see on sourceforge, based on Plank's original encoder.  I wasn't involved with librs at all except to fix a couple reference leaks in the Python wrapper.
&lt;p&gt;
But if you're actually looking for an rs library to use, &lt;a href=http://www.linkedin.com/pub/1/451/627&gt;Alen Peacock&lt;/a&gt;, who is much more knowledgeable than I about the gory details involved here, tells me that if you are starting from scratch the two libraries you should evaluate are &lt;a href=http://pypi.python.org/pypi/zfec&gt;zfec&lt;/a&gt;, which also comes with Python bindings, and &lt;a href=http://www.cs.utk.edu/~plank/plank/papers/CS-07-603.html&gt;Jerasure&lt;/a&gt; which is an updated -- i.e., probably faster than his first -- encoder by Plank.  (Jerasure has nothing to do with Java.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7100540298112066875?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7100540298112066875/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7100540298112066875' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7100540298112066875'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7100540298112066875'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/11/reed-solomon-libraries.html' title='Reed-Solomon libraries'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8206608772799172844</id><published>2007-10-18T08:15:00.000-07:00</published><updated>2009-12-11T05:40:51.014-08:00</updated><title type='text'>Utah Data Recovery</title><content type='html'>&lt;p&gt;
About three years ago (so pre-Mozy and definitely pre-Mac Mozy) my brother had his powerbook hard disk die.  As in, not just mostly dead -- it would not power up.  It had a lot of stuff on it that he didn't want to lose, but he felt like the usual suspects who charge $1k to $2k for data recovery were a rip off.  So he hung onto the disk in case a cheaper option came along.
&lt;/p&gt;&lt;p&gt;
Then just recently when I saw some people on a local linux group mailing list recommend &lt;a href="http://utdatarecovery.com/"&gt;utah data rescue&lt;/a&gt; I suggested to my brother that he give it a try.  UTDR starts at "only" $300.
&lt;/p&gt;&lt;p&gt;
UTDR did indeed recover the data, although they charged $100 extra for this one.  Mac fee?  Tricky hw problem?  I don't know.  But it was still a lot cheaper than the other companies I googled for fixing a physically dead drive.  (As opposed to a corrupt partition table or something where the hardware itself was okay.)  At least, the ones that actually give you a price up front rather than hiding behind "request a quote!"&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8206608772799172844?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8206608772799172844/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8206608772799172844' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8206608772799172844'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8206608772799172844'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/10/utah-data-recovery.html' title='Utah Data Recovery'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2699285562503022550</id><published>2007-10-09T19:52:00.000-07:00</published><updated>2007-10-09T19:59:10.894-07:00</updated><title type='text'>Semi-automatic software installation on HP-UX, with dependencies</title><content type='html'>&lt;p&gt;
I had to install subversion on a couple HP-UX boxes.  Fortunately, there's an &lt;a href="http://hpux.cs.utah.edu/"&gt;HP-UX software archive&lt;/a&gt; out there with precompiled versions of lots of software.  Unfortunately, dependency resolution is like the bad old days of 1997: entirely manual.  And there's fifteen or so dependencies for subversion.
&lt;p&gt;
So, I wrote a script to parse the dependencies and download the packages automatically.  It requires Python -- which you can install from the archive with just the Python package and the Db package -- and BeautifulSoup, which you can google for.  Usage is
&lt;pre class=code&gt;
hpuxinstaller &amp;lt;archive package url&amp;gt; &amp;lt;package name&amp;gt;
[e.g., hpuxinstaller http://hpux.cs.utah.edu/hppd/hpux/Development/Tools/subversion-1.4.4/ subversion]
[wait for packages to download]
gunzip *.gz
[paste in conveniently given swinstall commands]
&lt;/pre&gt;
&lt;p&gt;
Here is the script:
&lt;pre class=code&gt;
#!/usr/local/bin/python

import urlparse, urllib2, sys, os
from subprocess import Popen, PIPE
from BeautifulSoup import BeautifulSoup

required = {}
if not os.path.exists('cache'):
    os.mkdir('cache')

def getcachedpage(url):
    fname = 'cache/' + url.replace('/', '-')
    try:
        page = file(fname).read()
    except IOError:
        print 'fetching ' + url
        page = urllib2.urlopen(url).read()
        file(fname, 'wb').write(page)
    return page

def dependencies(url):
    scheme, netloc, _, _, _, _ = urlparse.urlparse(url)
    soup = BeautifulSoup(getcachedpage(url))
    text = soup.find('td', text='Run-time dependencies:')
    if not text:
        return
    tr = text.parent.parent
    td = tr.findAll('td')[1]
    for a in td.findAll('a'):
        yield (a.contents[0], '%s://%s%s' % (scheme, netloc, a['href']))

def add(name, url):
    required[name] = url
    for depname, depurl in dependencies(url):
        if depname in required:
            continue
        print "%s requires %s" % (name, depname)
        required[depname] = depurl
        add(depname, depurl)
        
def download(full_url):
    print 'downloading ' + full_url
    _, _, path, _, _, _ = urlparse.urlparse(full_url)
    fname = os.path.basename(path)
    f = file(fname, 'wb')
    def chunkify_to_eof(stream, chunksize=64*1024):
        while True:
            data = stream.read(chunksize)
            if not data:
                break
            yield data
    for chunk in chunkify_to_eof(urllib2.urlopen(full_url)):
        f.write(chunk)


# Compute dependencies before checking for installed files, since swinstall
# can let a package be installed w/o its dependencies. If there are such
# packages installed we don't want to skip their [missing] dependencies.
add(sys.argv[2], sys.argv[1])

try:
    p = Popen(['swlist'], stdout=PIPE)
except:
    print 'Warning: unable to list installed packages'
    installed = {}
else:
    installed = set(line.strip().split()[0] for line in p.stdout if line.strip())

to_install = []
for name, url in required.iteritems():
    if name in installed:
        print name + ' is already installed'
        continue
    full_url = '%s%s-ia64-11.23.depot.gz' % (url.replace('/hppd/', '/ftp/'), url.split('/')[-2])
    to_install.append(os.path.basename(full_url))
    download(full_url)

if to_install:
    print "\nAfter gunzip, run:"
    for fname in to_install:
        print "swinstall -s %s/%s %s" % (os.getcwd(), fname[:-3], fname.split('-')[0])
else:
    print 'Everything is already installed'
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2699285562503022550?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2699285562503022550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2699285562503022550' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2699285562503022550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2699285562503022550'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/10/semi-automatic-software-installation-on.html' title='Semi-automatic software installation on HP-UX, with dependencies'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7070867612222994068</id><published>2007-10-05T16:41:00.000-07:00</published><updated>2007-10-05T17:02:34.287-07:00</updated><title type='text'>Congratulations, Mozy</title><content type='html'>&lt;p&gt;
I left backup service provider &lt;a href=http://mozy.com&gt;Mozy&lt;/a&gt; about three months ago, and yesterday &lt;a href=http://www.emc.com/news/emc_releases/showRelease.jsp?id=5368&amp;l=en&amp;c=US&gt;they were acquired by EMC&lt;/a&gt; as &lt;a href=http://www.techcrunch.com/2007/09/23/breaking-online-backup-startup-mozy-acquired-by-emc-for-76-million/&gt;rumored by techcrunch&lt;/a&gt; earlier.

&lt;p&gt;
The cool thing about startups is they pretty much have to hire people who are totally not qualified to do awesome things and let them try.  There's no way Amazon would have hired me to write S3, but that's what I did for Mozy.

&lt;p&gt;
Mozy was the third startup I've been a part of, and the first to amount to anything.  I was employee number #3 and saw it grow from sharing a single rented office to 50 employees in two years.  With people who didn't think it was strange to wear a tie to work.  Trippy.

&lt;p&gt;Unfortunately I'm not there to witness the final stage of being assimilated by the Borg firsthand, but I hear that's not really any more fun than it sounds so perhaps it's just as well.

&lt;p&gt;
Nice work, guys.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7070867612222994068?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7070867612222994068/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7070867612222994068' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7070867612222994068'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7070867612222994068'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/10/congratulations-mozy.html' title='Congratulations, Mozy'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-2533623753774416064</id><published>2007-10-02T16:10:00.001-07:00</published><updated>2007-10-02T17:10:04.905-07:00</updated><title type='text'>Wing IDE 3, Wing IDE 101 released</title><content type='html'>&lt;p&gt;
&lt;a href="http://wingware.com/"&gt;Wing IDE&lt;/a&gt; version 3 &lt;a href="http://groups.google.com/group/comp.lang.python.announce/browse_thread/thread/d22c03594c6fe1aa/0219e0fe92703047"&gt;has been released&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;
The list of new features is a little underwhelming.  Multi-threaded debugging and the unit testing tool (only supporting unittest -- does anyone still use that old module anymore?) are nice but I don't see myself paying to upgrade from 2.1 yet.  Now if they could get the GUI to keep up with my typing in Windows, I'd pay for that...  I guess this is a sign that Python IDEs are nearing maturity; Komodo 4 didn't have any earth-shaking new features either, at least as far as Python was concerned.
&lt;p&gt;
(Personally I think someone should start supporting django/genshi/mako templates already.  Maybe in 3.1, guys?)
&lt;p&gt;
Following &lt;a href="http://spyced.blogspot.com/2007/01/komodo-4-released-new-free-version.html"&gt;ActiveState's lead&lt;/a&gt;, Wingware has also released a completely free version, Wing IDE 101.  The main difference is that where the most essential feature Komodo Edit leaves out as an incentive to upgrade is debugging, Wing IDE 101 includes the debugger but omits code completion.  Wingware also continues to offer the low-cost Personal edition.
&lt;/p&gt;&lt;p&gt;
But the really big difference between Wing IDE 101 and Komodo Edit is that you can freely use Komodo Edit for paying work.  Wing IDE 101, like Wing IDE Personal has a no-commercial-use clause.  (&lt;a href="http://www.activestate.com/Products/komodo_edit/edit_vs_ide.plex"&gt;Komodo versions compared&lt;/a&gt;; &lt;a href="http://wingware.com/wingide/features"&gt;Wing versions compared&lt;/a&gt;.)  I'm still of the opinion that at $180, Wing Professional will pay for itself in short order, but for the hobbyist, Komodo Edit is very compelling.  I've been using it myself for TCL and XML editing for several months now and it's a nice little IDE.
&lt;/p&gt;&lt;p&gt;
Too bad Komodo's emacs bindings continue to suck balls -- I mean, it's one thing to not implement fancy things like a minibuffer or kill ring, but if you can't even get C-W (cut) right, there's not much hope.  Users contributed much-improved Emacs bindings to the ActiveState bug tracker way back in the version 3 timeframe.  I guess ActiveState just doesn't care.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-2533623753774416064?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/2533623753774416064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=2533623753774416064' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2533623753774416064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/2533623753774416064'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/10/wing-ide-3-wing-ide-101-released.html' title='Wing IDE 3, Wing IDE 101 released'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7731651901095751729</id><published>2007-09-21T22:03:00.000-07:00</published><updated>2007-09-21T22:11:05.953-07:00</updated><title type='text'>That wasn't the pigeonhole I expected</title><content type='html'>&lt;p&gt;
I went to the &lt;a href=http://csaa.byu.edu/&gt;BYU CS alumni&lt;/a&gt; dinner tonight.  At one point they briefly put everyone's name and position on a projector, one at a time.  (At five seconds apiece it wasn't as tedious as it sounds.)
&lt;p&gt;
When it was my turn, it announced "Jonathan Ellis, System Administrator."
&lt;p&gt;
What the hell?
&lt;p&gt;
It turns out that when I RSVP'd I said I was a "python kung-fu master &amp; sysadmin of last resort."  (In the sense that, if you really can't find a better sysadmin, I know enough to be dangerous.)
&lt;p&gt;
Don't bother trying to be clever around bureaucrats.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7731651901095751729?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7731651901095751729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7731651901095751729' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7731651901095751729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7731651901095751729'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/09/that-wasnt-pigeonhole-i-expected.html' title='That wasn&apos;t the pigeonhole I expected'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8891274440725368905</id><published>2007-09-08T18:18:00.000-07:00</published><updated>2007-09-08T18:26:43.215-07:00</updated><title type='text'>Utah Open Source Conference 2007</title><content type='html'>&lt;p&gt;
The first &lt;a href=http://www.utosc.org/&gt;Utah Open Source Conference&lt;/a&gt; finished today.  I heard that they had close to 300 attendees -- not bad at all for a freshman effort.
&lt;p&gt;
I reprised presentations that I've given before, on SQLAlchemy and distributed source control.  My slides are on the &lt;a href=http://www.utosc.org/presentations/&gt;presentations&lt;/a&gt; page (although if you've seen my slides from either before, there's not much new there -- I got lucky, SA 0.4 isn't stable yet so I stuck with 0.3.10).
&lt;p&gt;
I had to work Friday so I missed a lot of presentations, but of the one I saw my favorite was on &lt;a href=http://ganglia.sourceforge.net/&gt;Ganglia&lt;/a&gt;, which I hadn't heard of before but which looks quite useful for anyone running a bunch of servers that takes uptime and qos seriously.  (This was actually Brad Nicholes's third presentation of the conference -- he must have been busy!)
&lt;p&gt;
Afterwards I went to the board games BoF and played Mag Blast.  Fun little game.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8891274440725368905?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8891274440725368905/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8891274440725368905' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8891274440725368905'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8891274440725368905'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/09/utah-open-source-conference-2007.html' title='Utah Open Source Conference 2007'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5874329068534156378</id><published>2007-09-05T18:55:00.000-07:00</published><updated>2007-09-05T20:31:20.136-07:00</updated><title type='text'>What it means to "know Python"</title><content type='html'>&lt;p&gt;
Since Adam Barr &lt;a href=http://www.proudlyserving.com/archives/2007/08/knowing_a_langu.html&gt;replied&lt;/a&gt; to &lt;a href=http://spyced.blogspot.com/2007/07/brief-reaction-to-find-bug.html&gt;my post on his book&lt;/a&gt;, I'd like to elaborate a little on what I said.
&lt;p&gt;
Adam wrote,
&lt;blockquote&gt;
[F]or me, "knowing" Python means you understand how slices work, the difference between a list and a tuple, the syntax for defining a dictionary, that indenting thing you do for blocks, and all that. It's not about knowing that there is a sort() function.
&lt;/blockquote&gt;
&lt;p&gt;
In Python, reinventing sort and split is like a C programmer starting a project by writing his own malloc.  It just isn't something you see very often.  Similarly, I just don't think you can credibly argue that a C programmer who doesn't know how to use malloc really knows C.  At some level, libraries do matter.
&lt;p&gt;
On the other hand, I wouldn't claim that you must know all eleventy jillion methods that the Java library exposes in one way or another to say you know Java.  
&lt;p&gt;
What is the middle ground here?
&lt;p&gt;
I think the answer is something along the lines of, "you have to get enough practice actually using the language to be able to write idiomatic code."  That's necessarily going to involve picking up some library knowledge along the way.
&lt;p&gt;
This made me think.  What &lt;i&gt;are&lt;/i&gt; the most commonly used Python modules?  I decided to scan the &lt;a href=http://www.activestate.com/ASPN/Python/Cookbook/&gt;Python Cookbook's&lt;/a&gt; code base and find out.  This is a fairly large sample (over 2000 recipes), and further attractive in that most of the scripts there are reasonably standalone, so they're not filled with importing lots of non-standard modules.  The downside is there is code dating back at least to the very ancient Python 1.5 version.
&lt;p&gt;
In 2000+ source files and almost 4000 imports of stdlib modules, here are the frequency counts of imported modules.

&lt;p&gt;
Is this a reasonable list?  I obviously think I qualify as knowing Python well enough to blog about it.  Of the modules above the 80% line, _winreg, win32con, and win32api are platform-specific; new is deprecated, string isn't officially deprecated but should be, and __future__ isn't really a module per se.  I believe I've used all of the rest but xmlrpclib at some point, although my line of comfort-without-docs would be only about  the 60% mark.  I think anyone who programs professionally will quickly get to knowing well at least the modules up to the 50% line.

&lt;table&gt;
&lt;tr&gt;&lt;td&gt;sys&lt;/td&gt;&lt;td&gt;473&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;os&lt;/td&gt;&lt;td&gt;302&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=2&gt;&lt;b&gt;24%&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;time&lt;/td&gt;&lt;td&gt;210&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;re&lt;/td&gt;&lt;td&gt;145&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=2&gt;&lt;b&gt;35%&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;string&lt;/td&gt;&lt;td&gt;140&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;random&lt;/td&gt;&lt;td&gt;103&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;threading&lt;/td&gt;&lt;td&gt;66&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;socket&lt;/td&gt;&lt;td&gt;57&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;os.path&lt;/td&gt;&lt;td&gt;52&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;types&lt;/td&gt;&lt;td&gt;50&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Tkinter&lt;/td&gt;&lt;td&gt;47&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=2&gt;&lt;b&gt;50%&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;math&lt;/td&gt;&lt;td&gt;43&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;win32com.client&lt;/td&gt;&lt;td&gt;42&lt;/td&gt;&lt;/tr&gt; 
&lt;tr&gt;&lt;td&gt;__future__&lt;/td&gt;&lt;td&gt;41&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;traceback&lt;/td&gt;&lt;td&gt;40&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;itertools&lt;/td&gt;&lt;td&gt;38&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;doctest&lt;/td&gt;&lt;td&gt;37&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;urllib&lt;/td&gt;&lt;td&gt;35&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;cStringIO&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;struct&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=2&gt;&lt;b&gt;60%&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;win32api&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;getopt&lt;/td&gt;&lt;td&gt;29&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;thread&lt;/td&gt;&lt;td&gt;29&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;ctypes&lt;/td&gt;&lt;td&gt;28&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;StringIO&lt;/td&gt;&lt;td&gt;28&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;inspect&lt;/td&gt;&lt;td&gt;26&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;win32con&lt;/td&gt;&lt;td&gt;25&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;copy&lt;/td&gt;&lt;td&gt;25&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;cPickle&lt;/td&gt;&lt;td&gt;25&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;operator&lt;/td&gt;&lt;td&gt;24&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;datetime&lt;/td&gt;&lt;td&gt;23&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;cgi&lt;/td&gt;&lt;td&gt;22&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=2&gt;&lt;b&gt;70%&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Queue&lt;/td&gt;&lt;td&gt;22&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;urllib2&lt;/td&gt;&lt;td&gt;20&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;md5&lt;/td&gt;&lt;td&gt;20&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;base64&lt;/td&gt;&lt;td&gt;20&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;xmlrpclib&lt;/td&gt;&lt;td&gt;19&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;sets&lt;/td&gt;&lt;td&gt;19&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;optparse&lt;/td&gt;&lt;td&gt;19&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;logging&lt;/td&gt;&lt;td&gt;18&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;weakref&lt;/td&gt;&lt;td&gt;18&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;shutil&lt;/td&gt;&lt;td&gt;17&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;unittest&lt;/td&gt;&lt;td&gt;17&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;pprint&lt;/td&gt;&lt;td&gt;16&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;urlparse&lt;/td&gt;&lt;td&gt;15&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;getpass&lt;/td&gt;&lt;td&gt;15&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;httplib&lt;/td&gt;&lt;td&gt;15&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;pickle&lt;/td&gt;&lt;td&gt;15&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;_winreg&lt;/td&gt;&lt;td&gt;14&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;UserDict&lt;/td&gt;&lt;td&gt;13&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;signal&lt;/td&gt;&lt;td&gt;13&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=2&gt;&lt;b&gt;80%&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;
For those interested, a tarball of the recipes I scanned is &lt;a href=http://www.utahpython.org/jellis/recipes.tar.bz2&gt;here&lt;/a&gt;, so you don't need to scrape the Cookbook site yourself.  The import scanning code is simple enough:

&lt;pre class=code&gt;
import os, re, compiler
from collections import defaultdict

# define an AST visitor that only cares about "import" and "from [x import y]" nodes
count_by_module = defaultdict(lambda: 0)
class ImportVisitor:
    def visitImport(self, t):
        for m in t.names:
            if not isinstance(m, basestring):
                m = m[0] # strip off "as" part
            count_by_module[m] += 1
    def visitFrom(self, t):
        count_by_module[t.modname] += 1

# parse
for fname in os.listdir('recipes'):
    try:
        ast = compiler.parseFile('recipes/%s' % fname)
    except SyntaxError:
        continue
    compiler.walk(ast, ImportVisitor())
    print 'parsed ' + fname

# some raw stats, for posterity
counts = count_by_module.items()
total = sum(n for module, n in counts)
print '%d/%d total/unique imports' % (total, len(counts))

# strip out non-stdlib modules
for module in count_by_module.keys():
    try:
        __import__(module)
    except (ImportError, ValueError):
        del count_by_module[module]
        
# post-stripped stats
counts = count_by_module.items()
total = sum(n for module, n in counts)
print '%d/%d total/unique imports in stdlib' % (total, len(counts))
counts.sort(key=lambda (module, n): n)

# results
subtotal = 0
for module, n in reversed(counts):
    subtotal += n
    print '%s\t%d' % (module, n)
    print '%f' % (float(subtotal) / total)
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5874329068534156378?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5874329068534156378/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5874329068534156378' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5874329068534156378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5874329068534156378'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/09/what-it-means-to-know-python.html' title='What it means to &quot;know Python&quot;'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6511514365065097430</id><published>2007-09-04T07:22:00.000-07:00</published><updated>2007-09-04T13:16:54.298-07:00</updated><title type='text'>Merging two subversion repositories</title><content type='html'>&lt;blockquote&gt;
&lt;b&gt;Update:&lt;/b&gt; an anonymous commenter pointed out that yes, there is a (much!) better way to do this with svnadmin load --parent-dir, which is covered in the docs under "repository migration."  All I can say in my defense is that it wasn't something google thought pertinent.  So, for google's benefit: &lt;a href=http://svnbook.red-bean.com/en/1.4/svn.reposadmin.maint.html#svn.reposadmin.maint.migrate&gt;how to merge subversion repositories&lt;/a&gt;.  Thanks for the pointer, anonymous!
&lt;/blockquote&gt;
&lt;p&gt;
I needed to merge team A's svn repository into team B's.  I wanted to preserve the history of team A's commits as much as reasonably possible.  I would have thought that someone had written a tool to do this, but I couldn't find one, so I wrote this.  (Of course, now that I'm posting this, I fully expect someone to point me to a better, pre-existing implementation that I missed.)
&lt;p&gt;
The approach is to take a working copy of repository B, add a directory for A's code, and for each revision in A's repository, apply that to the working copy and commit it.  This would be easy if svn merge would allow applying diffs from repository A into a working copy of repository B, but it does not.  I can't think of a technical reason for this.  (In fact, I seem to remember that early versions of the svn client &lt;span style="font-style:italic;"&gt;did&lt;/span&gt; allow this, with dire warnings, but I could be mistaken and I don't have a 1.1 client around anymore.)
&lt;p&gt;
So I tried instead to use "svn diff |patch -p0", which worked great up until the first commit with a binary file.  Oops.  For the final version I ended up having to create a working copy for A, update to each revision there, then rsync to the right point in working copy B and call the "svnaddremove" script to mark files added or deleted.  (This is suboptimal since we can get the exact changed paths from svn, and just copy those files over, but rsync is fast enough as long as your working copies stay in cache.  The update and commit steps both consistently took longer than rsync in my timing.)
&lt;p&gt;
My script does not try to be intelligent about copies or moves that svn knows about.  Team A did not use branches or tags much so I didn't put the effort in to deal with those the "right" way (which would be to also issue a cp/mv on B's working copy to preserve history).  It also uses unix users to commit revisions with the same name as the original commit.  Doing this obviously requires at least access to the repository server to add the right users.  I used "svn log -q |grep ^r |awk '{print $3}' |sort |uniq |useradd."
&lt;p&gt;
Final note: the perl script in svnaddremove is a long way of writing "awk {print $2}", except that it preserves filenames with spaces in them.  There is probably a much more clever way of doing this, too.
&lt;p&gt;
Here, then, is the merge script:
&lt;pre class=code&gt;
#!/usr/bin/python

# usage: svnimport &amp;lt;source wc path&amp;gt; &amp;lt;target wc path&amp;gt; &amp;lt;revstart&amp;gt; &amp;lt;revend&amp;gt;
# e.g. svnimport liberte-source trunk/liberte 1 2000

from subprocess import Popen, PIPE
try:
    from xml.etree import cElementTree as ET
except ImportError:
    from elementtree import ElementTree as ET
import sys, time

def system(*args):
    p = Popen(args, stdout=PIPE, stderr=PIPE)
    out, err = p.communicate()
    if err:
        raise err
    return out

# super-minimal log scraper
# for a better one see hgsvn's svnclient.py, http://cheeseshop.python.org/pypi/hgsvn
def parse_date(svn_date):
    date = svn_date.split('.', 2)[0]
    return time.strftime("%Y-%m-%d", time.strptime(date, "%Y-%m-%dT%H:%M:%S"))
def parse_svn_log_xml(xml):
    tree = ET.fromstring(xml)
    for entry in tree.findall('logentry'):
        d = {}
        d['revision'] = int(entry.get('revision'))
        author = entry.find('author')
        d['author'] = author is not None and author.text or None
        d['message'] = entry.find('msg').text or ""
        d['date'] = parse_date(entry.find('date').text)
        yield d
        
def edited_message(entry):
    msg = entry['message'].strip().replace('\r\n', '\n')
    addendum = '[original revision %s committed %s]' % (entry['revision'], entry['date'])
    if msg:
        return msg + '\n' + addendum
    return addendum

sourcepath, targetpath, revstart, revstop = sys.argv[1:]
# rsync foo bar and rsync foo/ bar/ are very different!
if not sourcepath.endswith('/'):
    sourcepath += '/'
if not targetpath.endswith('/'):
    targetpath += '/'

xml = system('svn', 'log', sourcepath, '--xml', '-r', '%s:%s' % (int(revstart) + 1, revstop))
for entry in parse_svn_log_xml(xml):
    revno = entry['revision']
    print 'merging revision %d by %s' % (revno, entry['author'])
    # merge in the revision
    system('svn', 'up', '-r', str(revno), sourcepath)
    print '\trsync'
    system('rsync', '-a', '--exclude=.svn', '--delete', sourcepath, targetpath)
    system('/tmp/svnaddremove', targetpath) # svn should add this.  hg already did.
    # commit as the correct author, if available
    author = entry['author']
    print '\tchown'
    system('chown', '-R', author, targetpath)
    quoted_message = edited_message(entry).replace('"', "'")
    print '\tci'
    system('su', author, '-c', 'svn ci %s -m "%s"' % (targetpath, quoted_message))
&lt;/pre&gt;
&lt;p&gt;
And here is svnaddremove:
&lt;p&gt;
&lt;pre class=code&gt;
#!/bin/bash

# odd, xargs is invoking svn add/rm w/ no args when grep returns no lines.
# fix that with the ifs.
# (don't use grep -q or svn gets pissed about broken pipe.)

if svn st $1 | grep ^\? &gt; /dev/null; then
  svn st $1 | perl -ne 'chomp; @Fld = split(q{ }, $_, -1); if (/^\?/) { shift @Fld; print join(q{ }, @Fld) . "\n"; }' | xargs -n 1 -i svn add "{}"
fi
if svn st $1 | grep ^\! &gt; /dev/null; then
  svn st $1 | perl -ne 'chomp; @Fld = split(q{ }, $_, -1); if (/^\!/) { shift @Fld; print join(q{ }, @Fld) . "\n"; }' | xargs -n 1 -i svn rm "{}"
fi
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6511514365065097430?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6511514365065097430/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6511514365065097430' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6511514365065097430'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6511514365065097430'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/09/merging-two-subversion-repositories.html' title='Merging two subversion repositories'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8869823747287486010</id><published>2007-07-30T18:51:00.000-07:00</published><updated>2007-07-30T19:17:55.490-07:00</updated><title type='text'>A brief reaction to "Find the Bug"</title><content type='html'>&lt;p&gt;
I picked up a copy of Adam Barr's &lt;span style="font-style:italic;"&gt;Find the Bug&lt;/span&gt;, which is a cool concept for a book.  (5 languages, 50 programs, 50 bugs; see if you can spot them.)
&lt;p&gt;
I found the bug in the first program, in C, then skipped to the Python chapter.  The first two programs were not too bad, as pedagogical exercises go (although iterating through substrings instead of a.startswith(b) in the 2nd was painful).  The third, though, was "Alphabetize words," 25 sloc to perform the equivalent of

&lt;pre class=code&gt;
def alphabetize(buffer):
  L = buffer.split(' ')
  L.sort()
  return L
&lt;/pre&gt;

&lt;p&gt;
... doing everything about the hardest way possible.
&lt;p&gt;
Now, it's pretty hard to introduce a non-obvious bug into my version of this function, so it wouldn't be appropriate for Mr. Barr's book when written this way.  But the right thing to do is to make the task more difficult, not dumb Python down to the level of C!  It's very very painful to read Python written like that.
&lt;p&gt;
(Actually it's painful to read any language written at such a low level of expressivity, which is why I prefer not to use languages that really can't do any better.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8869823747287486010?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8869823747287486010/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8869823747287486010' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8869823747287486010'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8869823747287486010'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/07/brief-reaction-to-find-bug.html' title='A brief reaction to &quot;Find the Bug&quot;'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6299502710047336270</id><published>2007-07-23T22:18:00.000-07:00</published><updated>2007-07-24T11:43:30.850-07:00</updated><title type='text'>Final version of OSCON SQLAlchemy slides</title><content type='html'>&lt;p&gt;
&lt;a href=http://utahpython.org/jellis/sa-tutorial-oscon.pdf&gt;http://utahpython.org/jellis/sa-tutorial-oscon.pdf&lt;/a&gt;
&lt;p&gt;
Also the code snippets:
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=http://utahpython.org/jellis/test_tables.py&gt;http://utahpython.org/jellis/test_tables.py&lt;/a&gt;
&lt;li&gt;&lt;a href=http://utahpython.org/jellis/test_code.py&gt;http://utahpython.org/jellis/test_code.py&lt;/a&gt;
&lt;/ul&gt;
&lt;p&gt;
This is what I'll be using in my tutorial tomorrow.
&lt;p&gt;
Update: I forgot to "svn up" on my web server.  So &lt;i&gt;now&lt;/i&gt; the final version is up.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6299502710047336270?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6299502710047336270/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6299502710047336270' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6299502710047336270'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6299502710047336270'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/07/final-version-of-oscon-sqlalchemy.html' title='Final version of OSCON SQLAlchemy slides'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8023919917779999401</id><published>2007-07-09T12:28:00.000-07:00</published><updated>2007-07-09T12:42:05.245-07:00</updated><title type='text'>PEP rss feed is live</title><content type='html'>&lt;p&gt;
After I complained that &lt;a href=http://spyced.blogspot.com/2007/05/its-time-for-python-development-to-open.html&gt;python.org could use a PEP rss feed&lt;/a&gt;, David Goodger invited me to volunteer to write one.  So I did.  (With Martin v. Löwis doing the integration with the site build script.  Thanks Martin!)
&lt;p&gt;
The feed is live at &lt;a href=http://www.python.org/dev/peps/peps.rss&gt;http://www.python.org/dev/peps/peps.rss&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8023919917779999401?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8023919917779999401/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8023919917779999401' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8023919917779999401'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8023919917779999401'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/07/pep-rss-feed-is-live.html' title='PEP rss feed is live'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6900566307457306414</id><published>2007-06-16T05:45:00.000-07:00</published><updated>2009-11-21T05:43:12.537-08:00</updated><title type='text'>Opera 9.2 is a pretty good browser</title><content type='html'>&lt;p&gt;I've been trying Opera 9.2 for a week, and I'm pleased with it enough that it's going to continue to be my main browser.  The main selling points for me are

&lt;ul&gt;
&lt;li&gt;MDI weirdness is mostly hidden now, I hated earlier Opera UIs
&lt;li&gt;20-30% less memory use; even after &lt;a href=http://internetducttape.com/2006/12/02/how-to-fix-the-firefox-memory-leak-firefox-hack&gt;poking about in the guts of about:config to force FF's memory cache to the same 10MB&lt;/a&gt; that I gave Opera (which exposes this option right in the UI), Opera consistently uses less memory for the same workload.  (Without adding this option to FF, it would max out around 400MB instead of 150MB.)
&lt;li&gt;feels snappier; opera seems quicker to start rendering something useful on slow-loading sites like 1up.com, although total render time is about the same.  It's also instantaneous to open a new tab, which consistently takes around 1s on FF after I've been using it a while.  I open and close tabs frequently.
&lt;li&gt;UI takes up less space: I know it's possible to re-skin FF, but I'd have to google it to find out how.  Opera makes it easy.  I'm using the Fresh skin for Opera which condenses the File menu and nav bar to about half as much height as FF uses.
&lt;li&gt;built-in AutoFill, so I save even more space by not needing Google Toolbar
&lt;li&gt;javascript/DOM support is finally close enough to FF that most sites don't have to specifically code for Opera.  Heavy ajax use is an exception of course; I still have to use FF for Google Docs.  (Gmail and Maps work fine though, probably due to some effort by Google.)
&lt;li&gt;Download manager goes in another tab by default instead of a separate window.  I didn't realize how much FF's behavior annoyed me before.
&lt;/ul&gt;

&lt;p&gt;
Downsides:

&lt;ul&gt;
&lt;li&gt;Occasional rendering problems, such as when customizing a laptop on hp.com.  Blogger.com doesn't redirect to my dashboard when I'm already logged in to my google account.  
&lt;li&gt;Very very slow navigating large pages, such as a slashdot comment thread with 400 comments.  Isearch is even slower on large pages and can lock up the UI for minutes if you invoke it injudiciously.
&lt;li&gt;Sometimes ignores a site's instructions to not cache a dynamic page
&lt;li&gt;hard to tell which tab is active in the default theme.  (Fresh fixes this.)
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6900566307457306414?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6900566307457306414/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6900566307457306414' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6900566307457306414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6900566307457306414'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/06/opera-92-is-pretty-good-browser.html' title='Opera 9.2 is a pretty good browser'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5052718261580712185</id><published>2007-06-14T18:59:00.000-07:00</published><updated>2007-06-14T19:10:52.391-07:00</updated><title type='text'>A workaround for the sys.excepthook bug</title><content type='html'>&lt;p&gt;
About two years ago I reported the bug &lt;a href=https://sourceforge.net/tracker/?func=detail&amp;atid=105470&amp;aid=1230540&amp;group_id=5470&gt;sys.excepthook doesn't work in threads&lt;/a&gt;.  Then just recently someone asked in &lt;a href=irc://irc.freenode.net/#utahpython&gt;#utahpython&lt;/a&gt; if I had a workaround.  Here it is (also added as a comment to the bug report) -- all we do is monkeypatch Thread.run to run the excepthook manually if there is an uncaught exception:

&lt;pre class=code&gt;
def install_thread_excepthook():
    """
    Workaround for sys.excepthook thread bug
    (https://sourceforge.net/tracker/?func=detail&amp;atid=105470&amp;aid=1230540&amp;group_id=5470).
    Call once from __main__ before creating any threads.
    If using psyco, call psyco.cannotcompile(threading.Thread.run)
    since this replaces a new-style class method.
    """
    import sys, threading
    run_old = threading.Thread.run
    def run(*args, **kwargs):
        try:
            run_old(*args, **kwargs)
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            sys.excepthook(*sys.exc_info())
    threading.Thread.run = run
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5052718261580712185?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5052718261580712185/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5052718261580712185' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5052718261580712185'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5052718261580712185'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/06/workaround-for-sysexcepthook-bug.html' title='A workaround for the sys.excepthook bug'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-398225770664993328</id><published>2007-05-31T17:13:00.000-07:00</published><updated>2007-05-31T19:09:24.837-07:00</updated><title type='text'>How DOS 1.0 cost me an hour of scratching my head</title><content type='html'>&lt;p&gt;
A couple months ago, I migrated my &lt;a href=http://carnageblender.com&gt;text rpg Carnage Blender&lt;/a&gt; to a new server, with Ubuntu 6.06 on the new box.
&lt;p&gt;
For an unknown reason, ftstrpnm on the new box wouldn't generate the pngs I used in my captchas.  It was easier to just check in the images from the old machine into the my svn repository than debug this, so I did.
&lt;p&gt;
The downside was that my working copy on my Windows laptop stopped being able to update from the repository.  It would get to "words/con.png," and error out.  Google, for once, didn't turn up anything useful.
&lt;p&gt;
Today I got motivated.  I tried all kinds of ways to get this to work.  A new checkout had the same problem on Windows, but on Linux worked fine.  The svn command line client for windows didn't work any better than Tortoise -- instead of "Error: Can't open file '...words\.svn\text-base\con.png.svn-base': Access is denied", it barfed con.png to stdout, and died.  This was a clue, but I didn't realize that until  later.
&lt;p&gt;
Puzzled, I tried scp-ing con.png directly.  No dice.  Maybe it was a problem with my ssh server or client instead of subversion.  So I tarred up my Linux working copy and untarred on Windows.  Still it crapped out on con.png.  I gzipped just con.png and tried to scp that over.  That didn't work either.
&lt;p&gt;
I started experimenting with the filename itself.  I could scp the .gz just fine if I renamed it to c.png.gz first.  But "touch con.png" on windows failed, as did "touch con.txt."  Finally I googled [windows filenames con] and found &lt;a href=http://blogs.msdn.com/oldnewthing/archive/2003/10/22/55388.aspx&gt;the answer&lt;/a&gt; at the top of the results, unfortunately from before I started reading oldnewthing or it might have jogged my memory.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-398225770664993328?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/398225770664993328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=398225770664993328' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/398225770664993328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/398225770664993328'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/05/how-dos-10-cost-me-hour-of-scratching.html' title='How DOS 1.0 cost me an hour of scratching my head'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6685255251246486842</id><published>2007-05-31T04:54:00.000-07:00</published><updated>2007-05-31T05:19:40.827-07:00</updated><title type='text'>It's time for python development to open up a little</title><content type='html'>&lt;p&gt;I found out from &lt;a href=http://sayspy.blogspot.com/2007/05/abstract-base-classes-pep-accepted.html&gt;Brett Cannon's blog&lt;/a&gt; that &lt;a href=http://www.python.org/dev/peps/pep-3119/&gt;an abstract base clase (ABC) PEP&lt;/a&gt; has been accepted.
&lt;p&gt;I don't like this PEP.  It's a very big (and more important, inelegant) change to Python's style.  But my real complaint is that as big as this change is, and as much as I try to stay current with Python (subscribing to 30+ blogs) I didn't have a chance to get involved in the discussion until after the PEP was already approved.
&lt;p&gt;Python is big enough now that there should be some mechanism for feedback from the community before the priesthood of python-dev writes something in stone.  Currently, if you want to know about PEPs before they are approved, you have to subscribe to both python-dev and python-3000 (which isn't linked from either &lt;a href=http://www.python.org/community/lists/&gt;the mailing lists page&lt;/a&gt; or &lt;a href=http://python.org/dev/&gt;the dev page&lt;/a&gt;, btw).  I really don't care about &lt;a href=http://mail.python.org/pipermail/python-dev/2007-May/thread.html&gt;the vast majority of these lists' traffic&lt;/a&gt; but PEPs, at least some of them, are important.  
&lt;p&gt;
If &lt;a href=http://python.org/dev/summary/&gt;the python-dev summaries&lt;/a&gt; ever got updated this might be a potential solution, but even at their best I don't remember them ever getting closer than a month behind or so.  And two weeks is probably too coarse-grained anyway.
&lt;p&gt;
I think what python.org really needs is a PEP rss feed.  A friend thought that they already had one, but neither he nor I could find it.  So if it exists, it's well-hidden.  If it doesn't exist, it should.  Please?
&lt;p&gt;
(And if it's easier for whoever's in charge of such things to give me access to the server and repository than to do it himself, then yes, I'm volunteering.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6685255251246486842?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6685255251246486842/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6685255251246486842' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6685255251246486842'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6685255251246486842'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/05/its-time-for-python-development-to-open.html' title='It&apos;s time for python development to open up a little'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-342186496160632237</id><published>2007-04-18T19:27:00.000-07:00</published><updated>2007-04-18T19:46:19.953-07:00</updated><title type='text'>Best Python book for beginners</title><content type='html'>&lt;p&gt;
It's really surprisingly difficult for someone who has been programming for a long time to write about programming at a level appropriate for real beginners.  The first time I taught a class full of beginners at Neumont, I tried to take things as slow as possible.  Then I spent the next week covering the material from the first day even slower.
&lt;p&gt;
So when the UGIC asked me to recommend a book to get for the participants in the Introduction to Python, I looked at all the ones I could find, but they all either assumed too much existing knowledge or covered material that would just confuse a beginner.  Often both.  But then &lt;a href=http://www.michaelbernstein.com/&gt;Michael Bernstein&lt;/a&gt; pointed me to &lt;i&gt;&lt;a href=http://pythonfood.com/&gt;Python for Dummies&lt;/a&gt;&lt;/i&gt;.
&lt;p&gt;
If you're looking to teach beginners, or you're a beginner yourself, &lt;i&gt;Python for Dummies&lt;/i&gt; is by far the best option.  There's a few sections that are strikingly inappropriate for a book at its level (new-style classes!?) but it's still much, much better than any of the other books on the market in this respect.  As a bonus, it's also one of the few that covers Python 2.5.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-342186496160632237?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/342186496160632237/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=342186496160632237' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/342186496160632237'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/342186496160632237'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/04/best-python-book-for-beginners.html' title='Best Python book for beginners'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5671383738994733650</id><published>2007-04-18T10:51:00.000-07:00</published><updated>2007-04-18T13:32:37.184-07:00</updated><title type='text'>Introduction to Python slides</title><content type='html'>&lt;p&gt;
Here are the &lt;a href=http://utahpython.org/jellis/python99.pdf&gt;slides from my introduction to python&lt;/a&gt; at the &lt;a href=http://spyced.blogspot.com/2007/03/introduction-to-python-at-ugic.html&gt;UGIC conference&lt;/a&gt; today.
&lt;p&gt;
This presentation was meant for people with little to no programming experience.  So I deliberately kept it pretty basic, and in fact in 90 minutes we only covered up to about slide 20 in the pdf.  I also added an exercise before moving on to slide 10.  ("Read 3 integers into a list, and print the sum.")
&lt;p&gt;
There were 17? people there (which was the room's capacity), so it was very nice to have Kevin Bell also answering questions individually during the exercises.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5671383738994733650?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5671383738994733650/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5671383738994733650' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5671383738994733650'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5671383738994733650'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/04/introduction-to-python-slides.html' title='Introduction to Python slides'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4127007856294858167</id><published>2007-04-16T13:19:00.000-07:00</published><updated>2007-04-16T13:27:19.121-07:00</updated><title type='text'>Mercurial presentation slides</title><content type='html'>&lt;p&gt;
Thursday I presented on distributed source control and Mercurial to the utah python ug.  &lt;a href=http://utahpython.org/data/dscm-hg.pdf&gt;Here are my slides.&lt;/a&gt;
&lt;p&gt;
Then on Friday, Mozilla announced that they're &lt;a href=http://weblogs.mozillazine.org/preed/2007/04/version_control_system_shootou_1.html&gt;moving  from CVS to Mercurial&lt;/a&gt;, joining OpenSolaris and Xen &lt;a href=http://www.selenic.com/mercurial/wiki/index.cgi/ProjectsUsingMercurial&gt;and others&lt;/a&gt; on hg.
&lt;p&gt;
It's exciting to see what is still a small and elegant tool gain traction like this, even though in some ways hg (and dscm in general really) is still in the early adopter stage.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4127007856294858167?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4127007856294858167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4127007856294858167' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4127007856294858167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4127007856294858167'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/04/mercurial-presentation-slides.html' title='Mercurial presentation slides'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8770707297224759062</id><published>2007-04-10T14:26:00.000-07:00</published><updated>2007-04-10T14:43:06.460-07:00</updated><title type='text'>Mozy code deathmatch</title><content type='html'>&lt;p&gt;
My employer, the creator of Mozy, is running a &lt;a href=http://mozy.com/contest&gt;programming contest&lt;/a&gt; this Saturday.  9 languages are allowed.  The first 2 rounds are online; the finals are in American Fork (Utah), but if you make it that far you're guaranteed to win some money.
&lt;p&gt;
(We did this last year too; this year the prize money is doubled to $20k.  Not to mention how we are super-experienced contest organizers now!)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8770707297224759062?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8770707297224759062/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8770707297224759062' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8770707297224759062'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8770707297224759062'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/04/mozy-code-deathmatch.html' title='Mozy code deathmatch'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8372876703014212086</id><published>2007-04-02T09:22:00.000-07:00</published><updated>2007-04-02T12:56:08.655-07:00</updated><title type='text'>New mailing list for utah python user group</title><content type='html'>&lt;p&gt;
Since I neglected to archive the old list when moving utahpython.org to a new server, &lt;a href="http://utahpython.org/"&gt;the utah python user group&lt;/a&gt; has &lt;a href="http://groups.google.com/group/utahpython"&gt;a new mailing list&lt;/a&gt; courtesy of Google Groups.  (At least this way we're not dependent anymore on my incompetent sysadminning.)
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8372876703014212086?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8372876703014212086/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8372876703014212086' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8372876703014212086'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8372876703014212086'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/04/new-mailing-list-for-utah-python-user.html' title='New mailing list for utah python user group'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5660715023857980619</id><published>2007-03-29T22:23:00.000-07:00</published><updated>2007-03-29T22:47:06.241-07:00</updated><title type='text'>One thing I don't hate about Python</title><content type='html'>&lt;p&gt;Sure, some things about Python bug me.  But that's not what this is about.  I wanted to react to &lt;a href="http://www.jacobian.org/writing/2007/mar/04/hate-python/"&gt;Jacob Kaplan-Moss's gripes&lt;/a&gt; instead of promulgating my own.  Specifically, his problem with Python's interfaces, or lack thereof.
&lt;p&gt;
I think I can keep this brief: interfaces are a hack that Java uses because Gosling et al thought multiple inheritance was too confusing and/or dangerous.  (I believe I've read something recently where Gosling said that this was one decision he might do differently if he were re-designing Java now with the benefit of hindsight, but I can't find the source.  Anyone remember seeing that?)
&lt;p&gt;
Python &lt;span style="font-style:italic;"&gt;has&lt;/span&gt; MI.  It doesn't &lt;span style="font-style:italic;"&gt;need&lt;/span&gt; interfaces.  I'm a little baffled that someone on the django core team would cite this as a problem with Python.

&lt;p&gt;
Jacob's precise objection is,
&lt;blockquote&gt;
I shouldn’t need to care care about the difference between something that pretends to be a list and something that really is a list.
&lt;/blockquote&gt;

&lt;p&gt;That's just it!  You don't!  But of course what Jacob really means is, "It should be easy to discover what methods a library expects to find on MY object that pretends to be a list."  Which seems reasonable.  And sure, good documentation is always welcome.  

&lt;p&gt;
But when you cross the line to an Interface, at least the kind of Interface where Python itself would raise an error if I ignored the recommendation and left a method out (&lt;span style="font-style:italic;"&gt;because I knew it wasn't necessary&lt;/span&gt;), that's bondage &amp; discipline.  &lt;a href=http://spyced.blogspot.com/2005/06/anders-heljsberg-doesnt-grok-python.html&gt;That's not Python.&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5660715023857980619?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5660715023857980619/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5660715023857980619' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5660715023857980619'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5660715023857980619'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/03/one-thing-i-dont-hate-about-python.html' title='One thing I don&apos;t hate about Python'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8985150261212593277</id><published>2007-03-02T12:07:00.000-08:00</published><updated>2007-03-02T12:14:05.617-08:00</updated><title type='text'>Introduction to Python at UGIC conference</title><content type='html'>&lt;p&gt;
I'll be giving a (very!) introductory Python workshop at the &lt;a href=http://www.ugic.info/&gt;Utah Geographic Information Council&lt;/a&gt; conference in April.    After my 90 minutes, Kevin Bell -- also of the &lt;a href=http://utahpython&gt;utah python user group&lt;/a&gt; -- will present on specific GIS applications.
&lt;p&gt;
(Apparently Python is particularly big in GIS these days because one of the big vendors, ERSI, &lt;a href=http://www.google.com/search?q=site%3Aesri.com+python&gt;takes Python pretty seriously&lt;/a&gt;.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8985150261212593277?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8985150261212593277/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8985150261212593277' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8985150261212593277'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8985150261212593277'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/03/introduction-to-python-at-ugic.html' title='Introduction to Python at UGIC conference'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-6625731871459717923</id><published>2007-02-24T06:25:00.000-08:00</published><updated>2007-02-24T06:30:42.051-08:00</updated><title type='text'>PyCon web frameworks panel notes</title><content type='html'>&lt;p&gt;
I represented Spyce for the web frameworks panel.  It was pretty cool looking out at the standing-room-only crowd, even though let's face it, most people were not there because of Spyce. :)

&lt;p&gt;
&lt;a href=http://www.b-list.org/weblog/2007/02/23/pycon-2007-web-frameworks-panel&gt;James Bennett&lt;/a&gt; and &lt;a href=http://panela.blog-city.com/web_framework_panel_notes_pycon.htm&gt;Matt Harrison&lt;/a&gt; have notes posted online.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-6625731871459717923?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/6625731871459717923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=6625731871459717923' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6625731871459717923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/6625731871459717923'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/02/pycon-web-frameworks-panel-notes.html' title='PyCon web frameworks panel notes'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5716813119517771229</id><published>2007-02-23T15:27:00.000-08:00</published><updated>2007-02-25T13:31:27.811-08:00</updated><title type='text'>PyCon SqlSoup slides</title><content type='html'>&lt;p&gt;
I uploaded my slides for my Sunday SqlSoup talk, so they're linked in the &lt;a href=http://us.pycon.org/apps07/schedule/&gt;schedule&lt;/a&gt; now.  I also uploaded them &lt;a href=http://utahpython.org/jellis/sqlsoup.pdf&gt;here&lt;/a&gt;.
&lt;p&gt;
Update: the profile.py module I showed is &lt;a href=http://utahpython.org/jellis/profile.py&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5716813119517771229?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5716813119517771229/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5716813119517771229' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5716813119517771229'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5716813119517771229'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/02/pycon-sqlsoup-slides.html' title='PyCon SqlSoup slides'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-1656125113518184002</id><published>2007-02-23T09:27:00.000-08:00</published><updated>2007-02-23T09:31:19.026-08:00</updated><title type='text'>PyCon open-space talk on Spyce after the web frameworks panel</title><content type='html'>&lt;p&gt;
I'll demonstrate writing a simple app with Spyce, including the most painless Ajax you have ever seen.  Come check it out at 3:40 in the Bent Tree II room.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-1656125113518184002?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/1656125113518184002/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=1656125113518184002' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1656125113518184002'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1656125113518184002'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/02/pycon-open-space-talk-on-spyce-after.html' title='PyCon open-space talk on Spyce after the web frameworks panel'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-7169690319230237268</id><published>2007-02-22T16:35:00.000-08:00</published><updated>2007-02-22T16:48:29.379-08:00</updated><title type='text'>PyCon SQLAlchemy tutorial slides</title><content type='html'>&lt;p&gt;
My SQLAlchemy tutorial went pretty well for the most part.  It was a fast pace but most people kept up pretty well.  If I did it again I would add more of an intro to ORM in general for people who had never used one, but over half the attendees had used SO or django's or tried SA already.  I would also paste more code from my slides into the samples download to save people typing during the exercises (I had some, but I would do more next time).

&lt;p&gt;
I think most people liked it; the main exception was one fellow who was in way way over his head and visibly pissed about it.  (I used a list comprehension at one point and he had no idea what it was.)

&lt;p&gt;
&lt;a href=http://utahpython.org/jellis/sqlalchemy-tutorial.pdf&gt;The slides are here.&lt;/a&gt;  (The .py files referred to in the slides have also been moved to the jellis/ subdirectory.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-7169690319230237268?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/7169690319230237268/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=7169690319230237268' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7169690319230237268'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/7169690319230237268'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/02/pycon-sqlalchemy-tutorial-slides.html' title='PyCon SQLAlchemy tutorial slides'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-8215698896515145109</id><published>2007-02-20T10:18:00.000-08:00</published><updated>2007-02-20T10:22:06.537-08:00</updated><title type='text'>Spyce at PyCon</title><content type='html'>&lt;p&gt;
I'll be representing Spyce as a late addition to the Web Frameworks panel.  I'm also planning a lightning talk on Ajax in Spyce 2.2 (which will be released as soon as I finish getting the docs in shape) and an open-space Introduction to Spyce.  
&lt;p&gt;
See you there!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-8215698896515145109?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/8215698896515145109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=8215698896515145109' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8215698896515145109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/8215698896515145109'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/02/spyce-at-pycon.html' title='Spyce at PyCon'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-4634162944929990570</id><published>2007-02-15T18:03:00.000-08:00</published><updated>2007-02-15T18:05:14.468-08:00</updated><title type='text'>Best Google Tech Talks?</title><content type='html'>&lt;p&gt;
My wife got me a PSP for Valentine's Day, so I'm looking for videos to put on it.  Since Google makes it so easy to get a PSP-compatible version, I thought I'd start with theirs...  &lt;a href=http://video.google.com/videosearch?q=google+techtalks&gt;Recommendations?&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-4634162944929990570?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/4634162944929990570/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=4634162944929990570' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4634162944929990570'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/4634162944929990570'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/02/best-google-tech-talks.html' title='Best Google Tech Talks?'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5229981016261509460</id><published>2007-02-14T13:01:00.000-08:00</published><updated>2007-02-14T13:08:53.016-08:00</updated><title type='text'>SQLAlchemy slides</title><content type='html'>&lt;p&gt;
I presented on SQLAlchemy at the Utah python user group last Thursday; slides are linked &lt;a href=http://utahpython.org/meetings.spy&gt;here&lt;/a&gt;.  
&lt;p&gt;
In retrospect, for a shorter presentation like this I should probably spend more time talking about the ORM features, and less about the SQL layer.  Although the SQL layer is useful on its own, and essential for doing advanced mapping, I don't think it has the sex appeal that the ORM has.
&lt;p&gt;
(Although I do think the first part, about why ORMs should allow you to take advantage of your database's strengths rather than being limited to a MySQL 3 feature set, was useful.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5229981016261509460?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5229981016261509460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5229981016261509460' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5229981016261509460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5229981016261509460'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/02/sqlalchemy-slides.html' title='SQLAlchemy slides'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-5273610274503113883</id><published>2007-01-25T11:25:00.000-08:00</published><updated>2007-01-25T16:10:26.515-08:00</updated><title type='text'>Komodo 4 released; new free version</title><content type='html'>&lt;p&gt;
ActiveState has released &lt;a href=http://www.activestate.com/products/komodo_ide&gt;Komodo IDE 4&lt;/a&gt;.  Perhaps more  interesting, if you're not already a Komodo user, is the release of &lt;a href=http://www.activestate.com/products/komodo_edit&gt;Komodo Edit&lt;/a&gt;, which is very similar to the old Komodo IDE Personal edition, only instead of costing around $30, Komodo Edit is free.  The mental difference between "free" and "$30" is much more than the relatively small amount of money; it will be interesting to see what happens in the IDE space now.
&lt;p&gt;
After a brief evaluation I would say Edit is perhaps the strongest contender for "best free python IDE."  The only serious alternative is PyDev, which on its Eclipse foundation provides features like svn integration that Edit doesn't.  PyDev also includes a debugger, another feature ActiveState would like to see you upgrade to the full IDE for.  But Komodo is stronger in other areas such as call tips and, well, not being based on Eclipse.  I also think its code completion is better, although this impression is preliminary.
&lt;p&gt;
It's also worth noting that so far, Edit doesn't sport the "Non-commercial and educational use only" restrictions that Komodo Personal had.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-5273610274503113883?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/5273610274503113883/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=5273610274503113883' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5273610274503113883'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/5273610274503113883'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/komodo-4-released-new-free-version.html' title='Komodo 4 released; new free version'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-1685372079284721497</id><published>2007-01-22T15:09:00.000-08:00</published><updated>2007-01-22T15:10:02.369-08:00</updated><title type='text'>PyCon SQLAlchemy tutorial full</title><content type='html'>See you there!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-1685372079284721497?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/1685372079284721497/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=1685372079284721497' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1685372079284721497'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/1685372079284721497'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/pycon-sqlalchemy-tutorial-full.html' title='PyCon SQLAlchemy tutorial full'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-700449018978300636</id><published>2007-01-17T17:55:00.000-08:00</published><updated>2007-02-26T09:05:01.802-08:00</updated><title type='text'>Caution: upgrading to new version of blogger may increase spam</title><content type='html'>&lt;p&gt;
I was pretty happy with the old version of blogger, but I upgraded today so I can use the new API against my own blog.  So far I have 4 spam comments (captcha is still on) versus about that number for the entire life of my blog under the old blogger.  Bleh.
&lt;p&gt;
Could just be a coincidence.  I hope so.
&lt;p&gt;
(Update Feb 26: A month later, I've had just one more spam comment.  So it probably really was just coincidence.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-700449018978300636?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/700449018978300636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=700449018978300636' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/700449018978300636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/700449018978300636'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/caution-upgrading-to-new-version-of.html' title='Caution: upgrading to new version of blogger may increase spam'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-116892076281976583</id><published>2007-01-15T20:05:00.000-08:00</published><updated>2007-01-22T14:50:57.539-08:00</updated><title type='text'>Abstract of "Advanced PostgreSQL, part 1"</title><content type='html'>&lt;p&gt;
In December, Fujitsu made available a video of &lt;a href="http://video.google.com.au/videoplay?docid=5745755015991749390"&gt;Gavin Sherry speaking on Advanced PostgreSQL&lt;/a&gt;.  (Where's part 2, guys?)  Here's some of the topics Gavin addresses, and the approximate point at which they can be found in the video.
&lt;/p&gt;&lt;dl&gt;
&lt;dt&gt;[start]&lt;/dt&gt;
&lt;dd&gt;
wal_buffers: "at least 64"; when it's ok to turn fsync off [not very often]; how hard disk rpm limits write-based transaction rate, even with WAL
&lt;/dd&gt;
&lt;dt&gt;00:12:&lt;/dt&gt;
&lt;dd&gt;
wal_sync_method = fdatasync is worth checking out on Linux
&lt;/dd&gt;
&lt;dt&gt;00:13:&lt;/dt&gt;
&lt;dd&gt;
FSM [free space map], MVCC, and vacuum; how to determine appropriate FSM size; why this is important to avoid VACUUM FULL
&lt;/dd&gt;
&lt;dt&gt;00:22:&lt;/dt&gt;
&lt;dd&gt;
vaccum_cost_delay
&lt;/dd&gt;
&lt;dt&gt;00:26:&lt;/dt&gt;
&lt;dd&gt;
background writer
&lt;/dd&gt;
&lt;dt&gt;00:30:&lt;/dt&gt;
&lt;dd&gt;
history of buffer replacement strategies
&lt;/dd&gt;
&lt;dt&gt;00:37:&lt;/dt&gt;
&lt;dd&gt;
scenarios where bgwriter is not useful
&lt;/dd&gt;
&lt;dt&gt;00:41:&lt;/dt&gt;
&lt;dd&gt;
how random_page_cost affects planner's use of indexes
&lt;/dd&gt;
&lt;dt&gt;00:47:&lt;/dt&gt;
&lt;dd&gt;
effective_cache_size
&lt;/dd&gt;
&lt;dt&gt;00:49:&lt;/dt&gt;
&lt;dd&gt;
logging; how to configure syslog to not hose your performance
&lt;/dd&gt;
&lt;dt&gt;00:52:&lt;/dt&gt;
&lt;dd&gt;
linux file system configuration
&lt;/dd&gt;
&lt;dt&gt;00:58:&lt;/dt&gt;
&lt;dd&gt;
solaris fs config
&lt;/dd&gt;
&lt;dt&gt;1:02:&lt;/dt&gt;
&lt;dd&gt;
raid; reliability; sata/scsi; battery-backed cache ("for $100, you can triple the write throughput of your system")
&lt;/dd&gt;
&lt;dt&gt;1:08:&lt;/dt&gt;
&lt;dd&gt;
tablespaces
&lt;/dd&gt;
&lt;dt&gt;1:12:&lt;/dt&gt;
&lt;dd&gt;
increasing pgsql_tmp performance for queries that exceed work_mem and how to tell if this is worth worrying about
&lt;/dd&gt;
&lt;dt&gt;1:15:40&lt;/dt&gt;
&lt;dd&gt;
cpu considerations
&lt;/dd&gt;
&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-116892076281976583?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/116892076281976583/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=116892076281976583' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116892076281976583'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116892076281976583'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/abstract-of-advanced-postgresql-part-1.html' title='Abstract of &quot;Advanced PostgreSQL, part 1&quot;'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-116866858262003451</id><published>2007-01-12T22:04:00.000-08:00</published><updated>2009-01-17T14:31:24.320-08:00</updated><title type='text'>Why SQLAlchemy impresses me</title><content type='html'>&lt;p&gt;One of the reasons ORM tools have a spotted reputation is that it's really, really easy to write a dumb ORM that works fine for simple queries but performs like molasses once you start throwing real data at it.
&lt;p&gt;
Let me give an example of a situation where, to my knowledge, only SQLAlchemy of the Python (or Ruby) ORMs is really able to handle things elegantly, without gross hacks like &lt;a href=http://railsexpress.de/blog/articles/2005/11/06/the-case-for-piggy-backed-attributes&gt;"piggy backing."&lt;/a&gt;
&lt;p&gt;
Often you'll see a one-to-many relationship where you're not always interested in &lt;i&gt;all&lt;/i&gt; of the -many side.  For instance, you might have a users table, each associated with many orders.  In SA you'd first define the Table objects, then create a mapper that's responsible for doing The Right Thing when you write "user.orders."  
&lt;p&gt;
(I'm skipping &lt;a href=http://www.sqlalchemy.org/docs/tutorial.myt#tutorial_gettingstarted_connecting&gt;connecting to the database&lt;/a&gt; for the sake of brevity, but that's pretty simple.  I'm also avoiding specifying columns for the Tables by assuming they're in the database already and telling SA to autoload them.  Besides keeping this code shorter, &lt;a href=http://spyced.blogspot.com/2006/02/why-schema-definition-belongs-in.html&gt;that's the way I prefer to work&lt;/a&gt; in real projects.)

&lt;pre class=code&gt;
users = Table('users', metadata, autoload=True)
orders = Table('orders', metadata, autoload=True)

class User(object): pass
class Order(object): pass

mapper(User, users, 
       properties={
           'orders':relation(mapper(Order, orders), order_by=orders.c.id),
       })
&lt;/pre&gt;

&lt;p&gt;
That "properties" dict says that you want your User class to provide an "orders" attribute, mapped to the orders table.  If you are using a sane database, SQLAlchemy will automatically use the foreign keys it finds in the relation; you don't need to explicitly specify that it needs to join on "orders.user_id = user.id."

&lt;p&gt;
We can thus write

&lt;pre class=code&gt;
for user in session.query(User).select():
    print user.orders
&lt;/pre&gt;

&lt;p&gt;
So far this is nothing special: most ORMs can do this much.  Most can also specify whether to do eager loading for the orders -- where all the data is pulled out via joins in the first select() -- or lazy loading, where orders are loaded via a separate query each time the attribute is accessed.  Either of these can be "the right way" for performance, depending on the use case.

&lt;p&gt;
The tricky part is, what if I want to generate a list of all users &lt;i&gt;and the most recent order for each&lt;/i&gt;?  The naive way is to write

&lt;pre class=code&gt;
class User:
    @property
    def max_order(self):
        return self.orders[-1]

for user in session.query(User).select():
    print user, user.max_order
&lt;/pre&gt;

&lt;p&gt;
This works, but it requires loading all the orders when we are really only interested in one.  If we have a lot of orders, this can be painful.

&lt;p&gt;
One solution in SA is to create a new relation that knows how to load just the most recent order.  Our new mapper will look like this:

&lt;pre class=code&gt;
mapper(User, users, 
       properties={
           'orders':relation(mapper(Order, orders), order_by=orders.c.id),
           'max_order':relation(mapper(Order, max_orders, non_primary=True), uselist=False, viewonly=True),
       })
&lt;/pre&gt;

&lt;p&gt;
("non_primary" means the second mapper does not define persistence for Orders; you can only have one primary mapper at a time.  "viewonly" means you can't assign to this relation directly.)
&lt;p&gt;
Now we have to define "max_orders."  To do this, we'll leverage SQLAlchemy's ability to map not just Tables, but any Selectable:

&lt;pre class=code&gt;
max_orders_by_user = select([func.max(orders.c.order_id).label('order_id')],
                            group_by=[orders.c.user_id]).alias('max_orders_by_user')
max_orders = orders.select(orders.c.order_id==max_orders_by_user.c.order_id).alias('max_orders')
&lt;/pre&gt;

&lt;p&gt;
"max_orders_by_user" is a subselect whose rows are the max order_id for each user_id.  Then we use that to define max_orders as the entire order row joined to that subselect on user_id.

&lt;p&gt;
We could define this as eager-by-default in the mapper, but in this scenario we only want it eager on a per-query basis.  That looks like this:

&lt;pre class=code&gt;
q = session.query(User).options(eagerload('max_order'))
for user in q.select():
    print user, user.max_order
&lt;/pre&gt;

For fun, here's the sql generated:
&lt;pre class=code&gt;
SELECT users.user_name AS users_user_name, users.user_id AS users_user_id,
    anon_760c.order_id AS anon_760c_order_id, anon_760c.user_id AS anon_760c_user_id,
    anon_760c.description AS anon_760c_description, 
    anon_760c.isopen AS anon_760c_isopen
FROM users LEFT OUTER JOIN (
    SELECT orders.order_id AS order_id, orders.user_id AS user_id, 
        orders.description AS description, orders.isopen AS isopen
    FROM orders, (
        SELECT max(orders.order_id) AS order_id
        FROM orders GROUP BY orders.user_id) AS max_orders_by_user
    WHERE orders.order_id = max_orders_by_user.order_id) AS anon_760c 
ON users.user_id = anon_760c.user_id 
ORDER BY users.oid, anon_760c.oid
&lt;/pre&gt;

&lt;p&gt;
In SQLAlchemy, easy things are easy; hard things take some effort up-front, but once you have your relations defined, it's almost magical how it pulls complex queries together for you.

&lt;p&gt;
.................
&lt;p&gt;
I'm giving a &lt;a href="http://us.pycon.org/TX2007/TutorialsPM3outline"&gt;tutorial on Advanced Databases with SQLAlchemy&lt;/a&gt; at PyCon in February.  Feel free to let me know if there is anything you'd like me to cover specifically.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-116866858262003451?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/116866858262003451/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=116866858262003451' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116866858262003451'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116866858262003451'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html' title='Why SQLAlchemy impresses me'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-116863912484234571</id><published>2007-01-12T13:49:00.000-08:00</published><updated>2007-01-12T13:59:46.533-08:00</updated><title type='text'>MySQL backend performance</title><content type='html'>&lt;p&gt;
Vadim Tkachenko posted &lt;a href=http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1/&gt;an interesting benchmark&lt;/a&gt; of MyISAM vs InnoDB vs Falcon datatypes.  (Falcon is the new backend that MySQL started developing after Oracle bought InnoDB.)  For me the interesting part is not the part with the alpha code -- Falcon is competitive for some queries but gets absolutely crushed on others -- but how InnoDB is around 30% faster than MyISAM.  And these are pure selects, supposedly where MyISAM is best.
&lt;p&gt;
Of course this is a small benchmark and YMMV, but this is encouraging to me because it suggests that if I ever have to use MySQL, I can use a backend with transactions, real foreign key support, etc., without sucking too badly performance-wise.
&lt;p&gt;
(It also suggests that people who responded to the post on &lt;a href=http://spyced.blogspot.com/2006/12/benchmark-postgresql-beats-stuffing.html&gt;postgresql  crushing mysql&lt;/a&gt; in a different benchmark by saying, "well, if they wanted speed they should have used MyISAM," might want to reconsider their advice.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-116863912484234571?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/116863912484234571/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=116863912484234571' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116863912484234571'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116863912484234571'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/mysql-backend-performance.html' title='MySQL backend performance'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-116848893976711351</id><published>2007-01-10T12:43:00.000-08:00</published><updated>2007-01-10T20:15:39.896-08:00</updated><title type='text'>Fun with three-valued logic</title><content type='html'>&lt;p&gt;
I thought I was pretty used to SQL's &lt;a href=http://www-cs-students.stanford.edu/~wlam/compsci/sqlnulls&gt;three-valued logic&lt;/a&gt; by now, but this still caused me a minute of scratching my head:

&lt;pre class=code&gt;
# select count(*) from _t;
 count
-------
  1306
(1 row)

# select count(*) from _t2;
 count
-------
 19497
(1 row)
&lt;/pre&gt;

Both _t and _t2 are temporary tables of a single column I created with SELECT DISTINCT.

&lt;pre class=code&gt;
# select count(*) from _t where userhash in (select userhash from _t2);
 count
-------
   982
(1 row)

# select count(*) from _t where userhash not in (select userhash from _t2);
 count
-------
     0
(1 row)
&lt;/pre&gt;

&lt;p&gt;
Hmm, 982 + 0 != 1306...

&lt;p&gt;
Turns out there was a null in _t2; X in {set containing null} evaluates to null, not false, and negating null still gives null.  (The rule of thumb is, any operation on null is still null.)

&lt;p&gt;
.................
&lt;p&gt;
I'm giving a &lt;a href=http://us.pycon.org/TX2007/TutorialsPM3outline&gt;tutorial on Advanced Databases with SQLAlchemy&lt;/a&gt; at PyCon in February.  Feel free to let me know if there is anything you'd like me to cover specifically.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-116848893976711351?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/116848893976711351/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=116848893976711351' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116848893976711351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116848893976711351'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/fun-with-three-valued-logic.html' title='Fun with three-valued logic'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11683713.post-116777873136010717</id><published>2007-01-02T14:49:00.000-08:00</published><updated>2007-01-02T14:58:51.386-08:00</updated><title type='text'>Good advice for Tortoise SVN users</title><content type='html'>&lt;p&gt;My thinkpad R52's screen died a couple days ago.  I decided that this time I was going to be a man and install Linux on my new machine: all our servers run Debian, and "apt-get install" is just so convenient vs manual package installation on Windows.  And it looks like qemu is a good enough "poor man's vmware" that I could still test stuff in IE when necessary.
&lt;p&gt;
Alas, it was not to be.  My new laptop is an HP dv9005, and although ubuntu's livecd mode ran fine, when it actually installed itself to the HDD and loaded X it did strange and colorful things to the LCD.  Things that didn't resemble an actual desktop.  When I told it to start in recovery mode instead it didn't even finish booting.
&lt;p&gt;
That was all the time I had to screw around, so I reinstalled Windows to start getting work done again.  Which brings me (finally!) to &lt;a href=http://progblog.wordpress.com/2006/12/28/optimising-tortoisesvn/&gt;this advice on tortoisesvn&lt;/a&gt;: it &lt;i&gt;really&lt;/i&gt; puts teh snappy back in the tortoise.  Thanks annonymous progblogger!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11683713-116777873136010717?l=spyced.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://spyced.blogspot.com/feeds/116777873136010717/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11683713&amp;postID=116777873136010717' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116777873136010717'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11683713/posts/default/116777873136010717'/><link rel='alternate' type='text/html' href='http://spyced.blogspot.com/2007/01/good-advice-for-tortoise-svn-users.html' title='Good advice for Tortoise SVN users'/><author><name>Jonathan Ellis</name><uri>http://www.blogger.com/profile/11003648392946638242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_bwSkwFkEnF0/SjPKugMQT5I/AAAAAAAAAHU/q4vVUcXN9gw/S220/jbellis-3.jpg'/></author><thr:total>0</thr:total></entry></feed>
