Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Instead of dealing with all this complexity, I don't understand why a simpler approach is not used, like having a single interpreter per thread and a very good message passing strategy between interpreters.


You could already achieve that with OS processes and IPC. The whole point of having multi-threading is to be able to write compact, shared-memory code with minimal use of synchronization operators, and sharing as much code and data as possible.

One interpreter per-thread means all side-effects have to be migrated to the other threads to keep a consistent view of memory: guess what you will need to do that? yep, a global lock (except this time it's across all interpreters, instead of just one.)


If you restricted shared memory to objects explicitly declared as shared, you wouldn't need a GIL. You'd simply need per-object locks.

For scientific computing purposes, you can often accomplish this with multiple processes and a numpy array allocated by shmget/shmat. But I'm not sure how to share complex objects in this way.


I'm not quite sure if I'm right here (and I'd appreciate it if another HN reader corrected me).

But I think that's how the Queue object in Python 2.6 works. The Queue instance is locked, you seem to be free to do whatever within the threads that are consuming the queue.

The reason I'm not sure is that having a single object being locked seems to contradict the GIL concept...


...and if we moved away from the annoying-and-awkward shared memory model, then this becomes even less of a problem still.


The whole point of the "Smart message passing" strategy is to move objects from one thread to another one very fast (that is, zero-copy), and with very local locking requirements.


Like perl's ithread implementation?

I like perl's ithread setup a lot. You explicitly need to mark data as shared between threads, otherwise all variables/objects are local to the thread. Things like queues, for example, are implemented as shared @arrays with a locking primitive around the manipulations, mostly hidden behind the Queue API (Queue->enqueue and Queue->dequeue).

The interpreter code is still shared among all the threads, but each has thread local storage separate from the shared arena. I've found that explicitly needing to mark data as shared to be a big boon to development, I think it has helped reduce the number of the bugs related to shared state.


Last time I used ithread it made Perl crash and burn. This was on program with about 30k lines of code. On top of that, creating a single thread takes 15 seconds because Perl is spending a ridiculous amount of time copying the entire state to the new thread. Eventually I used fork() and multiprocessing because that's much faster than using ithread. ithread is pretty useless.


Then you must have used them a really long time ago. The last time I did, I had no trouble decoding audio streams in one thread, pushing the raw audio samples into a Queue, pulled the raw audio samples out of the Queue in another thread and sent them to the audio device, and doing network IO in another thread. This code is at http://github.com/thwarted/thundaural (no longer maintained) and was written in 2004 and 2005 against perl 5.6 and perl 5.8. It's hardly a complex or large threaded application, nor an example of fantastic code, but I never experienced "crash and burn" or prohibitively long thread start up time.

I can't produce a significantly long start up time when perl copies a 240meg+ local state using http://gist.github.com/258016. Much larger, and the overhead pushes my 4gig machine into swap (so perl's not very efficient with its data overhead, it seems). It is slower than if the structure is shared (you can uncomment a line to compare) between the threads, which shows that perl is copying the state to the other interpreters, but this just says you need to use the same model with perl threads that you'd use when you do multiprocessing with fork if you're going to have long lived child processes that don't exec: spawn threads early before your heap balloons to avoid all that spawn-time copying (which is a good practice anyway, even when COW is used for fork). Every model has its trade offs, and it's good to know those trade offs going in, rather than treating the flavor of the day as a silver bullet.

And even if perl's implementation isn't stable/good, it doesn't mean the model of explicit sharing is a bad one. And it most definitely doesn't mean that using a separate interpreter in each thread with global sharing is bad at all.


This is how Tcl works.


Yes, probably Tcl lacks some zero copy object passing mechanism IIRC.


Tcl has reference counting and copy on write to minimize copies when passing objects to functions. Not sure if this is the case for slave interpreters (Tcl's term), but I would be very surprised if it were not.

http://tmml.sourceforge.net/doc/tcl/Object.html

http://www.equi4.com/moam/strength


Lua, too, though there are a couple extensions to add threads.


You mean like the multiprocessing package? http://docs.python.org/library/multiprocessing.html


No because IIRC the Python multiprocessing package uses fork rather than threads.


In Linux they're identical :-)


Noo... fork() gives you a separate address space, whereas threads would require that you share an address space with the parent.

vfork() and clone() can almost give you what you want, but Linux threading is using something different (NPTL) nowadays.


You're correct. My position was that in Linux, threads weigh just as much as processes; the forking speed is identical, and extremely fast with COW. Compared to other OSes where processes and threads are completely different beasts.

Even with NPTL, both processes and threads have task_structs.


NPTL uses clone(2). fork(2) is implemented in terms of clone(2) also.


On plaforms without fork it uses threads (e.g. windows). Why do you care if it is using fork() anyhow?


Thats a question you should be asking the multiprocessing module guys ;-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: