How about teaching programmers to program without using threads? edit: sure down...

wglb · on Dec 16, 2009

There is probably more traction from what you are saying that some of the commenters suggest. For compute bound tasks, it is often productive to split computations into long-running processes.

There are two reasons that the discussion goes beyond what you suggest, I think. One is detailed in http://www.tbray.org/ongoing/When/200x/2009/09/27/Concur-dot... where the wide finder project is a way to explore relatively easy ways to split a log-searching task into effective threads on a multi-core machine.

This is really a hard problem, as evidenced by Tim's long series of articles detailing various forays into clojure and other languages.

Your comment "Hardware should handle multiple cores" reflects the opposite of what I think chip manufacturers are thinking. They run into the performance barrier, so they build a chip with more CPUs on it and hand the problem off to the compiler team and the rest of the software world.

I would take it another step further in challenging hardware manufacturers to look at the broader problem. There was an article recently that noticed that for Lisp, the effective performance gain over a decade or two went up by 50 where for C-family programs it went up by several orders of magnitude. To me this implies that hardware isn't going in the direction that supports higher-level computing.

Remember when the 360 instruction set came out? The 7094 people looked at it with some sense of dissapointment. And where are the nice instruction sets as evidenced by the PDP-10 and family?

Perhaps this implies smarter cores so that we don't have to do so many of them.

But in today's world, it seems that the languages that work well with multiple threads have a language construct that is required to make it work--libraries don't do the trick. The clean channels of GO and the constructs in Clojure point the way. Maybe the GIL-fix approach is truly doomed.

So I agree with your closing sentiment.

andrew1 · on Dec 16, 2009

I think you're being downvoted because people disagree with what you're saying. In my experience you need multiple threads when you want multiple things to happen at the same time. i.e. if I have a client/server architecture and one client instructs the server to perform a long running task then I don't want the server to appear frozen to all my other clients, which it would if the server ran in a single thread. I don't really see how you can get around this. Do you have a solution?

axod · on Dec 16, 2009

>> "you need multiple threads when you want multiple things to happen at the same time"

Computers don't work that way. Unless you have many CPUs, nothing happens at the same time.

Rewrite your 'long running task' to do things bit by bit. By effectively doing your own timeslicing, you remove the need for any locking or concurrency issues. Once you get into the habbit of programming like this, you wouldn't believe how much easier things are.

FWIW this is how Mibbit backend works - thousands of connections handled in a single thread.

Javascript doesn't have threads (thankfully). There is no need for threads. They look like magic, but they cause more issues than they solve IMHO. The mapping of 'work' onto physical CPUs should be done silently by the hardware IMHO (If you have more than one CPU).

prodigal_erik · on Dec 16, 2009

Windows 3 and MacOS classic apps used to get wedged every so often because cooperative multitasking is so easy to get slightly wrong.

Writing everything in continuation-passing style is like writing your own filesystem or parser state machine. It's a little more efficient per core, it's complicated enough that the challenge of getting it right might be gratifying, but it's rarely the best use of our time when we're surrounded with ridiculously powerful hardware. Even netbooks have multiple cores now.

coliveira · on Dec 16, 2009

> Windows 3 and MacOS classic apps used to get wedged every so often because cooperative multitasking is so easy to get slightly wrong

This is a different issue: multitasking of processes in the same CPU. This is an area that needs to be done by the OS, because you can't guarantee that other processes will be cooperative.

Inside your own process, however, you can do whatever you want by diving tasks in small chunks, without the need for threads.

The only place where I think we need threads is when dealing with libraries that we can't control. For example, UI libraries, networking libraries, math libraries, etc.

jcl · on Dec 16, 2009

I think his point is that when you "divide tasks in small chunks", you are no longer writing code in the way that is most convenient for you but in a way that gets better performance from a simple compiler. It means more code spread over more functions, which increases the probability of bugs.

bobbyi · on Dec 16, 2009

> Computers don't work that way. Unless you have many CPUs, nothing happens at the same time.

I have many CPUs.

caf · on Dec 16, 2009

Making the hardware map a single thread of control onto parallel computations on separate cores is a hard problem - one that hasn't been solved yet.

If you reckon you can solve it, you should definitely be able to get rich from it.

axod · on Dec 16, 2009

I don't think it's a problem that needs to be solved for most applications. Certainly for average websites/webapps, they're just passing data around. CPU usage shouldn't be high at all unless something really computationally expensive needs to go on like speech recognition or video encoding or something.

I'll let the scientists who actually need CPU power figure that one out.

Most of the websites people here are working on could be run on a 386 and still have cycles to spare.

scott_s · on Dec 16, 2009

Computers don't work that way. Unless you have many CPUs, nothing happens at the same time.

Most processors in the future will be parallel architectures of some kind. Even my current laptop is a multicore.

coliveira · on Dec 16, 2009

I think the idea that every piece of software will run in parallel in the future is nonsense. Hardware vendors are just trying to create a need where there is none.

Clearly, someone will find applications where this power is needed (graphics, simulation, robotics), but there is no way that MS Word will run in parallel in more than a few processors. The biggest change in multicore is in enabling new applications, not in changing the way current applications are developed.

scott_s · on Dec 16, 2009

I did not say every piece of software will run in parallel, or that it should. The subject is the runtime environment of a programming language. If Python is going to be used in these new applications you brought up, it would help if they were able to remove the GIL.

axod · on Dec 16, 2009

I've heard that argument since we started having dual CPUs 5-10 years ago. I don't buy it really. Netbooks are so popular mainly because we don't need so much cpu power on our local thin clients to the web.

liuliu · on Dec 16, 2009

These kind of argument sounds so familiar. Decades ago, when people worked on supercomputer, hundreds of millions dollars thrown to "parallel compiler", "shared memory machine" which aimed to reduce the complexity of parallel computing for programmers. But, it just doesn't work. If a programmer cannot aware of the underlying architecture of your parallel machine, the performance will get heavily harmed. That's why there are threads, message-passing, NUMA today.

_csoo · on Dec 16, 2009

You know why it doesn't work? Because programmers are still mainly using C or C-derived languages.

dkersten · on Dec 17, 2009

Totally agree.

For one, languages like Erlang, Haskell and Clojure already make concurrent programming reasonably approachable and second, if people were to switch to using proper dataflow languages* then parallelization is implicit and automagically done for the programmer.

* What I really want is a proper dataflow/imperative hybrid that lets me choose the right tool for the job...

andrew1 · on Dec 16, 2009

I appreciate that nothing happens at the same time on a single core but CPU time is shared between threads so it 'appears' as if more than one thing happens at once. A good example of this is a web browser on a single core machine - the browser does not freeze up while it is downloading data. That is because the CPU time is shared between the UI thread and the other worker threads.

axod · on Dec 16, 2009

Possibly in some browsers.

An alternate (better) model would be to simply have a single thread with a main loop, have async networking, and UI updates periodically in the same thread.

  while(true) {

     networking.check(); // Check if any sockets are ready for read/write/connect
     ui.update();    // Update the UI a bit if needed
  }

The only case this would be a terrible idea is if you don't have control of all the code, or need to interface to things that may block/crash/etc.

scott_s · on Dec 16, 2009

No modern web browser controls all of the code, since it must execute arbitrary JavaScript - which can block and crash.

Everyone doing something doesn't make them right. But when all major instances of an application are implemented differently than you think is best, perhaps you don't understand the problem as well as you think you do. Chrome, I think, is the best browser architecture, and it looks like IE and Firefox will adapt something similar. I think they use separate processes to manage tabs instead of threads, but it's still parallel.

axod · on Dec 16, 2009

>> "No modern web browser controls all of the code, since it must execute arbitrary JavaScript - which can block and crash."

That's a silly argument. The following can't block or crash.

  while(true) {
    jsRuntime.executeInstruction();
    // Other stuff.
  }

scott_s · on Dec 16, 2009

Execute a single JavaScript instruction at a time? I doubt the performance of that would be acceptable. But if you're aware of any browsers doing that, I'd like to know.

axod · on Dec 16, 2009

If I were to write a browser right now, it's how I'd do it. You would more likely execute a few js instructions per loop, depending on what else you have to do in that loop also - network check, ui update, etc.

Why would there be a performance hit in doing js instructions one by one though ;) A loop isn't expensive.

dkersten · on Dec 17, 2009

A loop isn't expensive, but swapping your interpreters code out of registers and cpu cache for each js instruction may be. Anyway, this method doesn't really get you much in the way of JIT compiling your JS either..

scott_s · on Dec 16, 2009

You would have terrible cache performance, since you would constantly bounce back and forth between the JavaScript VM and other browser code.

axod · on Dec 16, 2009

Code is code. If you can explain that a bit more I'd be interested.

Obviously you wouldn't update the UI every time you execute a js instruction. That would be insane. I just put 1 js instruction in the loop to have the minimum unit, in case anything else needs to be updated very quickly at the same time - eg some animation etc

tsuraan · on Dec 16, 2009

CPUs (or cores in a CPU) have a pretty small instruction cache. On a Core2, I believe it's 32K cache per core. It's somewhat expensive (slow) to fetch instructions from RAM to L1, but once code is in L1, reading from it is immediate, or close to it. If you have a two-core machine where one core can do most of your js dispatch logic, and the other core can do the inner loop of your rendering logic, then you can have much better performance than constantly swapping out your logic on the cache of a single core.

I'm really not sure how useful it is to cache-optimise a browser. You need to be able to fit a significant amount of logic into 32K in order to take advantage of cache. When I used to do realtime image processing (AR), I could get an order of magnitude speedup by just getting my logic to fit into 32K; image processing code can literally go from 3fps to 30fps just by tightening code to the point where it can fit in L1 cache. I don't know if it's possible to fit significant amounts of rendering logic or js logic into 32K, but if it is, then dedicating a core to each of those functions could give a significant speedup.

scott_s · on Dec 16, 2009

The JavaScript VM will have a significant amount of state associated with it. Executing a virtual instruction will require accessing that state. If that data is not in the CPU's cache, it will cause cache misses, which stall code progression.

If you then use that data in the cache for a while, then the cost of the cache miss will be amortized. But what you're proposing is going back and forth quickly between the JavaScript VM and the rest of the browser code. The browser code will also need to bring its data into the cache, which will kick out the JavaScript VM's data.

Since you're proposing that the JavaScript VM should do a very small amount of work at each time, and it will likely need to bring all of its data back into the cache each time, you will see a lot of CPU stalls.

axod · on Dec 17, 2009

Yeah I think we have a long way to go before js performance is affected by CPU caches.

coliveira · on Dec 16, 2009

This only works if the UI library has good performance. During years this was a big issue in Java, and why it really needed multi-threading.

gaius · on Dec 16, 2009

This is how Tcl works.

andrew1 · on Dec 16, 2009

I can't reply to your comment to this. Pretty much everything I do at work involves communicating with resources which may block/crash so it wouldn't really be practical to put everything in a single while loop. It sounds a little bit to me like you're re-inventing the wheel - multi-threading means you don't have to go to the pains of breaking your long-running tasks up into pieces, the infrastructure takes care of that for you. I'm not trying to convert you, it sounds like you're perfectly happy and successful working in a single threaded environment. It just sounds like you're doing quite a lot of work to avoid multiple threads, which really aren't that difficult to manage.

axod · on Dec 16, 2009

... and then you have to deal with all the concurrency and blocking issues that come with outsourcing your timeslicing.

Timeslicing isn't hard. It's even simpler if you're the one writing the code.

If you are in the horrible position of having to interface with blocking/crashing code though, great shame :(

andrew1 · on Dec 16, 2009

One of the big pains of my work day is our accounting system which, while being great at what it does well, can be abysmally slow to respond to queries (the simple question of how much of X do we own now, takes it around three minutes to answer...). We know from experience though that internally it can deal with up to three requests at a time without it slowing down. If I only had one thread then three requests (which block) would take me nine minutes to process, with three threads I can get all the results back in three minutes.

axod · on Dec 16, 2009

Interfacing with a broken bad interface is never fun. But perhaps fix the problem instead of working around it and letting it spread...

andrew1 · on Dec 16, 2009

Unfortunately that's not practical, it's third party software and at least the last time we checked there wasn't a better alternative with the same functionality.

dkersten · on Dec 17, 2009

nothing happens at the same time on a single core

Not always true. There are multiple levels of parallelism in modern CPUs and cores is only one of these, eg instruction-level parallelism, hyperthreading etc

Anyway, theres more to concurrent programming than doing multiple things at once in parallel - theres also the use of threads to turn a synchronous blocking call into an asynchronous one, like you said.

ErrantX · on Dec 16, 2009

> By effectively doing your own timeslicing, you remove the need for any locking or concurrency issues

Native threads do this timeslicing for you; that is why they were invented (so that programmers didnt have to mess around with crazy micro-management of that they were doing).

Can you imagine writing Chrome or FireFox using a single thread (it would be hard, and probably unusable)? :)

Threads serve an excellent purpose

coliveira · on Dec 16, 2009

> Native threads do this timeslicing for you; that is why they were invented

But they don't solve the fundamental problem of how to organize your code to take advantage of this. It is really easy to create a threaded application that has lower performance than a single threaded application that divides work by chunks. And, after all research in this area, there is no clear way to use threads without spending a lot of time to make sure that they work.

axod · on Dec 16, 2009

Agree 100%. Often using threads are a false economy.

krakensden · on Dec 16, 2009

> Can you imagine writing Chrome or FireFox using a single thread (it would be hard, and probably unusable)? :)

Firefox is single threaded.

Chrome runs each tab in its own process- also not multithreading.

andrew1 · on Dec 16, 2009

At least on Windows they are both multi-threaded. If you look in Task Manager then firefox.exe will have multiple threads (32 on my machine) and each chrome.exe process will have multiple threads (anywhere between 3 and 21 on the 15 tabs I have open).

petewarden · on Dec 16, 2009

> Firefox is single threaded

Incorrect. See http://mxr.mozilla.org/firefox/search?string=thread

_csoo · on Dec 16, 2009

FWIW this is how Mibbit backend works - thousands of connections handled in a single thread.

You might as well not be running an OS and have Mibbit run straight on the hardware. All it's doing is handling network connections and some other data processing right? And don't look at me like I'm crazy, your OS is using valuable CPU time!

axod · on Dec 16, 2009

... which would be relevant if it was CPU bound.

j_baker · on Dec 16, 2009

The problem is that you're also preventing yourself from being able to take advantage of multiple cores. Of course, it would be nice if hardware would divide those tasks up between processors automatically. But then they'd be threads.

j_baker · on Dec 16, 2009

It's not so much that I disagree with him/her. In fact, I would love it if we could do away with memory sharing. It's just that it seems a bit utopian (especially the argument that the hardware should do it for you).

My understanding is that he/she wanted to have the hardware handle concurrency automatically. Unfortunately, I don't see any way to get around having software dealing with concurrency.

randallsquared · on Dec 16, 2009

I may be misunderstanding your question, but it seems as though your "server" could be written as a one-request server and then you'd just start another one to listen when a request came in. If starting a new process is too slow, use a pool. Some versions of Apache use this method. It's not a panacea, but it does provide a way you could use to avoid threaded code entirely.

andrew1 · on Dec 16, 2009

OK, but surely there are cases where it would be useful (I would argue necessary) to have a single process with multiple threads - when dealing with GUIs for example.

randallsquared · on Dec 16, 2009

No argument that there are cases where it's useful; I was just saying it's possible to avoid.

bioweek · on Dec 16, 2009

Sounds like a good idea. I'd upvote you but I can't seem to vote on comments for some reason. Too young, or new?

silentbicycle · on Dec 16, 2009

It looks like your karma isn't high enough.

bioweek · on Dec 16, 2009

How high do I need?

wglb · on Dec 16, 2009

Used to be 200, but might have gotten upped.

tsuraan · on Dec 16, 2009

It's around 200 for downvoting, but I think upvoting is a lot lower. Closer to 20, I think, but I've been able to upvote for a really long time.

dkersten · on Dec 17, 2009

And hardware can handle concurrency - you just need to switch to a dataflow language.