Since Adam Barr replied to my post on his book, I'd like to elaborate a little on what I said.
Adam wrote,
[F]or me, "knowing" Python means you understand how slices work, the difference between a list and a tuple, the syntax for defining a dictionary, that indenting thing you do for blocks, and all that. It's not about knowing that there is a sort() function.
In Python, reinventing sort and split is like a C programmer starting a project by writing his own malloc. It just isn't something you see very often. Similarly, I just don't think you can credibly argue that a C programmer who doesn't know how to use malloc really knows C. At some level, libraries do matter.
On the other hand, I wouldn't claim that you must know all eleventy jillion methods that the Java library exposes in one way or another to say you know Java.
What is the middle ground here?
I think the answer is something along the lines of, "you have to get enough practice actually using the language to be able to write idiomatic code." That's necessarily going to involve picking up some library knowledge along the way.
This made me think. What are the most commonly used Python modules? I decided to scan the Python Cookbook's code base and find out. This is a fairly large sample (over 2000 recipes), and further attractive in that most of the scripts there are reasonably standalone, so they're not filled with importing lots of non-standard modules. The downside is there is code dating back at least to the very ancient Python 1.5 version.
In 2000+ source files and almost 4000 imports of stdlib modules, here are the frequency counts of imported modules.
Is this a reasonable list? I obviously think I qualify as knowing Python well enough to blog about it. Of the modules above the 80% line, _winreg, win32con, and win32api are platform-specific; new is deprecated, string isn't officially deprecated but should be, and __future__ isn't really a module per se. I believe I've used all of the rest but xmlrpclib at some point, although my line of comfort-without-docs would be only about the 60% mark. I think anyone who programs professionally will quickly get to knowing well at least the modules up to the 50% line.
sys | 473 |
os | 302 |
24% | |
time | 210 |
re | 145 |
35% | |
string | 140 |
random | 103 |
threading | 66 |
socket | 57 |
os.path | 52 |
types | 50 |
Tkinter | 47 |
50% | |
math | 43 |
win32com.client | 42 |
__future__ | 41 |
traceback | 40 |
itertools | 38 |
doctest | 37 |
urllib | 35 |
cStringIO | 33 |
struct | 32 |
60% | |
win32api | 31 |
getopt | 29 |
thread | 29 |
ctypes | 28 |
StringIO | 28 |
inspect | 26 |
win32con | 25 |
copy | 25 |
cPickle | 25 |
operator | 24 |
datetime | 23 |
cgi | 22 |
70% | |
Queue | 22 |
urllib2 | 20 |
md5 | 20 |
base64 | 20 |
xmlrpclib | 19 |
sets | 19 |
optparse | 19 |
logging | 18 |
weakref | 18 |
shutil | 17 |
unittest | 17 |
pprint | 16 |
urlparse | 15 |
getpass | 15 |
httplib | 15 |
pickle | 15 |
_winreg | 14 |
UserDict | 13 |
signal | 13 |
80% |
For those interested, a tarball of the recipes I scanned is here, so you don't need to scrape the Cookbook site yourself. The import scanning code is simple enough:
import os, re, compiler from collections import defaultdict # define an AST visitor that only cares about "import" and "from [x import y]" nodes count_by_module = defaultdict(lambda: 0) class ImportVisitor: def visitImport(self, t): for m in t.names: if not isinstance(m, basestring): m = m[0] # strip off "as" part count_by_module[m] += 1 def visitFrom(self, t): count_by_module[t.modname] += 1 # parse for fname in os.listdir('recipes'): try: ast = compiler.parseFile('recipes/%s' % fname) except SyntaxError: continue compiler.walk(ast, ImportVisitor()) print 'parsed ' + fname # some raw stats, for posterity counts = count_by_module.items() total = sum(n for module, n in counts) print '%d/%d total/unique imports' % (total, len(counts)) # strip out non-stdlib modules for module in count_by_module.keys(): try: __import__(module) except (ImportError, ValueError): del count_by_module[module] # post-stripped stats counts = count_by_module.items() total = sum(n for module, n in counts) print '%d/%d total/unique imports in stdlib' % (total, len(counts)) counts.sort(key=lambda (module, n): n) # results subtotal = 0 for module, n in reversed(counts): subtotal += n print '%s\t%d' % (module, n) print '%f' % (float(subtotal) / total)
Comments
Funny I was once asked to implement sort in my language of choice during a job interview. I said I'd use python and my implementation looked something like this "sort()".
The interviewer then revealed that he really wanted me to implement a sorting algorithm. I told him I hadn't done that since first year CS classes but that I'd implement a bubble sort. I honestly couldn't remember quicksort, since I haven't needed it for years, I pushed it out of my brain to make room for stuff I actually use. (I also said, if I really needed to implement a sort in real [working] conditions I would do it completely differently). Ah useless interview questions....
"""Module does useful importing as in
import os_sys_log
and never fails"""
?
Not Found
The requested URL /group/utahpythonjellis/recipes.tar.bz2 was not found on this server.