Python refcounts *and* it has a real mark+sweep collector for collecting cycles....

silentbicycle · on Dec 16, 2009

The reference counting causes the problem here. When multithreading means that incrementing/decrementing a reference count is no longer deterministic, you need locks, or all hell breaks loose. Adding a mark&sweep GC isn't going to fix that.

While I'm not familiar with the specifics of Python's GC, a mark&sweep phase is usually added to reference counting so that if there's garbage which contains references to itself but has no external references, it will eventually be collected. (_Garbage Collection_ by Richard Jones and Rafael Lins is an excellent resource on GC details, btw. There's also a decent overview in the O'Reilly OCaml book (http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora082.... )). In other words, it plugs the worst memory leaks caused by reference counting.

How to do multiprocessor / multithread GC well is still an area of active research. In the mean time, one simpler solution is to have several independent VM states, each running in their own thread (or process), and communicating via message passing. Lua makes this easy, but its VM is considerably lighter than Python's.

kingkilr · on Dec 16, 2009

The mark and sweep collector is a little more complicated than that. It doesn't use a single "seen" bit the way a normal GC header would, instead it uses the refcount itself in very, very clever ways