How about teaching programmers to program without using threads?
edit: sure downmod me. It's crazy talk! How could programmers do without threads and concurrency issues and all of the other blocking problems. Hardware should handle multiple cores. Not programmers.
There is probably more traction from what you are saying that some of the commenters suggest. For compute bound tasks, it is often productive to split computations into long-running processes.
There are two reasons that the discussion goes beyond what you suggest, I think. One is detailed in http://www.tbray.org/ongoing/When/200x/2009/09/27/Concur-dot... where the wide finder project is a way to explore relatively easy ways to split a log-searching task into effective threads on a multi-core machine.
This is really a hard problem, as evidenced by Tim's long series of articles detailing various forays into clojure and other languages.
Your comment "Hardware should handle multiple cores" reflects the opposite of what I think chip manufacturers are thinking. They run into the performance barrier, so they build a chip with more CPUs on it and hand the problem off to the compiler team and the rest of the software world.
I would take it another step further in challenging hardware manufacturers to look at the broader problem. There was an article recently that noticed that for Lisp, the effective performance gain over a decade or two went up by 50 where for C-family programs it went up by several orders of magnitude. To me this implies that hardware isn't going in the direction that supports higher-level computing.
Remember when the 360 instruction set came out? The 7094 people looked at it with some sense of dissapointment. And where are the nice instruction sets as evidenced by the PDP-10 and family?
Perhaps this implies smarter cores so that we don't have to do so many of them.
But in today's world, it seems that the languages that work well with multiple threads have a language construct that is required to make it work--libraries don't do the trick. The clean channels of GO and the constructs in Clojure point the way. Maybe the GIL-fix approach is truly doomed.
I think you're being downvoted because people disagree with what you're saying. In my experience you need multiple threads when you want multiple things to happen at the same time. i.e. if I have a client/server architecture and one client instructs the server to perform a long running task then I don't want the server to appear frozen to all my other clients, which it would if the server ran in a single thread. I don't really see how you can get around this. Do you have a solution?
>> "you need multiple threads when you want multiple things to happen at the same time"
Computers don't work that way. Unless you have many CPUs, nothing happens at the same time.
Rewrite your 'long running task' to do things bit by bit.
By effectively doing your own timeslicing, you remove the need for any locking or concurrency issues. Once you get into the habbit of programming like this, you wouldn't believe how much easier things are.
FWIW this is how Mibbit backend works - thousands of connections handled in a single thread.
Javascript doesn't have threads (thankfully). There is no need for threads. They look like magic, but they cause more issues than they solve IMHO. The mapping of 'work' onto physical CPUs should be done silently by the hardware IMHO (If you have more than one CPU).
Windows 3 and MacOS classic apps used to get wedged every so often because cooperative multitasking is so easy to get slightly wrong.
Writing everything in continuation-passing style is like writing your own filesystem or parser state machine. It's a little more efficient per core, it's complicated enough that the challenge of getting it right might be gratifying, but it's rarely the best use of our time when we're surrounded with ridiculously powerful hardware. Even netbooks have multiple cores now.
> Windows 3 and MacOS classic apps used to get wedged every so often because cooperative multitasking is so easy to get slightly wrong
This is a different issue: multitasking of processes in the same CPU. This is an area that needs to be done by the OS, because you can't guarantee that other processes will be cooperative.
Inside your own process, however, you can do whatever you want by diving tasks in small chunks, without the need for threads.
The only place where I think we need threads is when dealing with libraries that we can't control. For example, UI libraries, networking libraries, math libraries, etc.
I think his point is that when you "divide tasks in small chunks", you are no longer writing code in the way that is most convenient for you but in a way that gets better performance from a simple compiler. It means more code spread over more functions, which increases the probability of bugs.
I don't think it's a problem that needs to be solved for most applications. Certainly for average websites/webapps, they're just passing data around. CPU usage shouldn't be high at all unless something really computationally expensive needs to go on like speech recognition or video encoding or something.
I'll let the scientists who actually need CPU power figure that one out.
Most of the websites people here are working on could be run on a 386 and still have cycles to spare.
I think the idea that every piece of software will run in parallel in the future is nonsense. Hardware vendors are just trying to create a need where there is none.
Clearly, someone will find applications where this power is needed (graphics, simulation, robotics), but there is no way that MS Word will run in parallel in more than a few processors. The biggest change in multicore is in enabling new applications, not in changing the way current applications are developed.
I did not say every piece of software will run in parallel, or that it should. The subject is the runtime environment of a programming language. If Python is going to be used in these new applications you brought up, it would help if they were able to remove the GIL.
I've heard that argument since we started having dual CPUs 5-10 years ago. I don't buy it really. Netbooks are so popular mainly because we don't need so much cpu power on our local thin clients to the web.
These kind of argument sounds so familiar. Decades ago, when people worked on supercomputer, hundreds of millions dollars thrown to "parallel compiler", "shared memory machine" which aimed to reduce the complexity of parallel computing for programmers. But, it just doesn't work. If a programmer cannot aware of the underlying architecture of your parallel machine, the performance will get heavily harmed. That's why there are threads, message-passing, NUMA today.
For one, languages like Erlang, Haskell and Clojure already make concurrent programming reasonably approachable and second, if people were to switch to using proper dataflow languages* then parallelization is implicit and automagically done for the programmer.
* What I really want is a proper dataflow/imperative hybrid that lets me choose the right tool for the job...
I appreciate that nothing happens at the same time on a single core but CPU time is shared between threads so it 'appears' as if more than one thing happens at once. A good example of this is a web browser on a single core machine - the browser does not freeze up while it is downloading data. That is because the CPU time is shared between the UI thread and the other worker threads.
An alternate (better) model would be to simply have a single thread with a main loop, have async networking, and UI updates periodically in the same thread.
while(true) {
networking.check(); // Check if any sockets are ready for read/write/connect
ui.update(); // Update the UI a bit if needed
}
The only case this would be a terrible idea is if you don't have control of all the code, or need to interface to things that may block/crash/etc.
No modern web browser controls all of the code, since it must execute arbitrary JavaScript - which can block and crash.
Everyone doing something doesn't make them right. But when all major instances of an application are implemented differently than you think is best, perhaps you don't understand the problem as well as you think you do. Chrome, I think, is the best browser architecture, and it looks like IE and Firefox will adapt something similar. I think they use separate processes to manage tabs instead of threads, but it's still parallel.
Execute a single JavaScript instruction at a time? I doubt the performance of that would be acceptable. But if you're aware of any browsers doing that, I'd like to know.
If I were to write a browser right now, it's how I'd do it. You would more likely execute a few js instructions per loop, depending on what else you have to do in that loop also - network check, ui update, etc.
Why would there be a performance hit in doing js instructions one by one though ;) A loop isn't expensive.
A loop isn't expensive, but swapping your interpreters code out of registers and cpu cache for each js instruction may be. Anyway, this method doesn't really get you much in the way of JIT compiling your JS either..
Code is code. If you can explain that a bit more I'd be interested.
Obviously you wouldn't update the UI every time you execute a js instruction. That would be insane. I just put 1 js instruction in the loop to have the minimum unit, in case anything else needs to be updated very quickly at the same time - eg some animation etc
CPUs (or cores in a CPU) have a pretty small instruction cache. On a Core2, I believe it's 32K cache per core. It's somewhat expensive (slow) to fetch instructions from RAM to L1, but once code is in L1, reading from it is immediate, or close to it. If you have a two-core machine where one core can do most of your js dispatch logic, and the other core can do the inner loop of your rendering logic, then you can have much better performance than constantly swapping out your logic on the cache of a single core.
I'm really not sure how useful it is to cache-optimise a browser. You need to be able to fit a significant amount of logic into 32K in order to take advantage of cache. When I used to do realtime image processing (AR), I could get an order of magnitude speedup by just getting my logic to fit into 32K; image processing code can literally go from 3fps to 30fps just by tightening code to the point where it can fit in L1 cache. I don't know if it's possible to fit significant amounts of rendering logic or js logic into 32K, but if it is, then dedicating a core to each of those functions could give a significant speedup.
The JavaScript VM will have a significant amount of state associated with it. Executing a virtual instruction will require accessing that state. If that data is not in the CPU's cache, it will cause cache misses, which stall code progression.
If you then use that data in the cache for a while, then the cost of the cache miss will be amortized. But what you're proposing is going back and forth quickly between the JavaScript VM and the rest of the browser code. The browser code will also need to bring its data into the cache, which will kick out the JavaScript VM's data.
Since you're proposing that the JavaScript VM should do a very small amount of work at each time, and it will likely need to bring all of its data back into the cache each time, you will see a lot of CPU stalls.
I can't reply to your comment to this. Pretty much everything I do at work involves communicating with resources which may block/crash so it wouldn't really be practical to put everything in a single while loop. It sounds a little bit to me like you're re-inventing the wheel - multi-threading means you don't have to go to the pains of breaking your long-running tasks up into pieces, the infrastructure takes care of that for you. I'm not trying to convert you, it sounds like you're perfectly happy and successful working in a single threaded environment. It just sounds like you're doing quite a lot of work to avoid multiple threads, which really aren't that difficult to manage.
One of the big pains of my work day is our accounting system which, while being great at what it does well, can be abysmally slow to respond to queries (the simple question of how much of X do we own now, takes it around three minutes to answer...). We know from experience though that internally it can deal with up to three requests at a time without it slowing down. If I only had one thread then three requests (which block) would take me nine minutes to process, with three threads I can get all the results back in three minutes.
Unfortunately that's not practical, it's third party software and at least the last time we checked there wasn't a better alternative with the same functionality.
Not always true. There are multiple levels of parallelism in modern CPUs and cores is only one of these, eg instruction-level parallelism, hyperthreading etc
Anyway, theres more to concurrent programming than doing multiple things at once in parallel - theres also the use of threads to turn a synchronous blocking call into an asynchronous one, like you said.
> By effectively doing your own timeslicing, you remove the need for any locking or concurrency issues
Native threads do this timeslicing for you; that is why they were invented (so that programmers didnt have to mess around with crazy micro-management of that they were doing).
Can you imagine writing Chrome or FireFox using a single thread (it would be hard, and probably unusable)? :)
> Native threads do this timeslicing for you; that is why they were invented
But they don't solve the fundamental problem of how to organize your code to take advantage of this. It is really easy to create a threaded application that has lower performance than a single threaded application that divides work by chunks. And, after all research in this area, there is no clear way to use threads without spending a lot of time to make sure that they work.
At least on Windows they are both multi-threaded. If you look in Task Manager then firefox.exe will have multiple threads (32 on my machine) and each chrome.exe process will have multiple threads (anywhere between 3 and 21 on the 15 tabs I have open).
FWIW this is how Mibbit backend works - thousands of connections handled in a single thread.
You might as well not be running an OS and have Mibbit run straight on the hardware. All it's doing is handling network connections and some other data processing right? And don't look at me like I'm crazy, your OS is using valuable CPU time!
The problem is that you're also preventing yourself from being able to take advantage of multiple cores. Of course, it would be nice if hardware would divide those tasks up between processors automatically. But then they'd be threads.
It's not so much that I disagree with him/her. In fact, I would love it if we could do away with memory sharing. It's just that it seems a bit utopian (especially the argument that the hardware should do it for you).
My understanding is that he/she wanted to have the hardware handle concurrency automatically. Unfortunately, I don't see any way to get around having software dealing with concurrency.
I may be misunderstanding your question, but it seems as though your "server" could be written as a one-request server and then you'd just start another one to listen when a request came in. If starting a new process is too slow, use a pool. Some versions of Apache use this method. It's not a panacea, but it does provide a way you could use to avoid threaded code entirely.
OK, but surely there are cases where it would be useful (I would argue necessary) to have a single process with multiple threads - when dealing with GUIs for example.
edit: sure downmod me. It's crazy talk! How could programmers do without threads and concurrency issues and all of the other blocking problems. Hardware should handle multiple cores. Not programmers.