CPUs (or cores in a CPU) have a pretty small instruction cache. On a Core2, I be...

CPUs (or cores in a CPU) have a pretty small instruction cache. On a Core2, I believe it's 32K cache per core. It's somewhat expensive (slow) to fetch instructions from RAM to L1, but once code is in L1, reading from it is immediate, or close to it. If you have a two-core machine where one core can do most of your js dispatch logic, and the other core can do the inner loop of your rendering logic, then you can have much better performance than constantly swapping out your logic on the cache of a single core.

I'm really not sure how useful it is to cache-optimise a browser. You need to be able to fit a significant amount of logic into 32K in order to take advantage of cache. When I used to do realtime image processing (AR), I could get an order of magnitude speedup by just getting my logic to fit into 32K; image processing code can literally go from 3fps to 30fps just by tightening code to the point where it can fit in L1 cache. I don't know if it's possible to fit significant amounts of rendering logic or js logic into 32K, but if it is, then dedicating a core to each of those functions could give a significant speedup.