(Today's questions are very CPython-specific, but that hasn't stopped me before. :)
I spent some time today looking for the source of a bug that caused my program to leak memory. A C module ultimately proved to be at fault; before figuring that out, though, I suspected that something was hanging on to data read over a socket longer than it should. I decided to check this by summing the length of all string objects:
>>> import gc >>> sum([len(o) for o in gc.get_objects() if isinstance(o, str)]) 0
No string objects? Can't be. Let's try this:
>>> a = 'asdfjkl;' >>> len([o for o in gc.get_objects() if isinstance(o, str)]) 0
So:
- (Easy) Why don't string objects show up for get_objects()?
- (Harder) How can you get a list of live string objects in the interpreter?
Comments
2) I think gc.get_referrers('') should do it.
I don't know how that interacts with threads, though, and it smells like a great place for Heisenberg's Uncertainty Principle to jump up and bite you.
When trying it out in a clean python shell I got an awful lot of strings back... Don't try it in ipython!
If gc tracks (most) container objects, then if you wanted to track integers or other untracked objects maybe you could walk gc.get_objects() yourself. Although that always makes me a bit queasy. Maybe the inspect module could add an object-tree walking function to simplify things and warn about common pitfalls.
>>> import sys
>>> from twisted.python import reflect
>>> reflect.findInstances(sys.modules, str)
[lots of output]
findInstances is implemented in terms of objgrep, which takes the starting object you give it (sys.modules in this case) and recursively follows all of its references until it can find no new objects. It doesn't return the objects it finds, of course: it returns strings describing the reference path to each object it finds. Since the visitor is arbitrary and user-specified, changing this is a simple exercise left for the reader.
The short answer is that you need to recursively expand the list from gc.get_objects() with gc.get_referents(). Make sure to expand each object only once to avoid duplicates. (You have to do it recursively because gc.get_objects() doesn't return all container objects.)
Actual code for this does not fit in the margin of this comment, so you can find it here (at the bottom).
It's clearly not that get_objects() returns only the objects that contain other objects, because it doesn't return all such objects. (Otherwise one wouldn't need recursive expansion of its results, and one does.)
I took a quick look at the 2.4 CPython source code for the gc module, but couldn't see anything obvious to explain it. (get_objects() returns all the objects on the three gc generations lists, but I couldn't decode when objects did or didn't end up on them.)
If you really need every live dynamically allocated object, you turn out to need a debug build of CPython.
If blog author can reply for someone's comment follow his comment, I think it is really cool.