Profile Guided Optimization without Profiles: A Machine Learning Approach https:...

pizlonator · 2025-03-03T18:22:55 1741026175

Very cool!

I've been thinking about what it would look like for something like this to be done for the profiling that you get from ICs, not the profiling you get from branch weights or basic block counts.

They're quite different. Two big differences:

- My best estimate is that speculating on type state (i.e. what you get from ICs) is a value bet only if you're right about 99.9% of the time (or even 99.999% - depends on your compiler/runtime architecture). I think you can get profit from branch weights if they are right less than 99.9% of the time.

- Speculating on type state means having semantically rich profiling information. It's not just a bunch of numbers. You need the profiler to describe a type to you, like: "I expect this access to see objects with fields x, y, z (in that order) and it has a prototype that has fields a, b, c, which then has a prototype with fields e, f, g".

andyayers · 2025-03-03T19:31:20 1741030280

For the .NET JIT, at least, speculation on types seems beneficial even if we're only right maybe 30% of the time.

See eg https://github.com/dotnet/runtime/blob/main/docs/design/core...

(where this is presented as a puzzle)....

pizlonator · 2025-03-03T19:40:15 1741030815

Guarded devirtualization is different from the speculation that I'm talking about.

To me, speculation is where the fail path exits the optimized code.

To handle JS's dynamism, guarding is usually not worth it (though JSC has the ability to do that, if the profiling says that the fail path is probable). I believe that most of HotSpot's perf comes from speculation rather than guarded devirt.

titzer · 2025-03-04T15:36:35 1741102595

> To me, speculation is where the fail path exits the optimized code.

V8 is now doing profile-based guarded inlining for Wasm indirect calls. The guards don't deopt, so it's a form of biasing where the fail path does indeed go through the full indirect call. That means the fail path rejoins, and ultimately, downstream, you don't learn anything, e.g. that there were no aliasing side effects, or anything about the return type of the inlined code.

You can get some of the effect of speculation with tail duplication after biasing, but in order to get the full effect you'd have to tail-duplicate all the way to the end of a function, or even unroll another iteration of the loop. It's possible to do this if you're willing to spend a lot of code space by duplicating a lot of basic blocks.

But the expensive thing about speculation is the deopt path, which is a really expensive OSR transfer and usually throws away optimized code, too. So clearly biasing is a different tradeoff, and I wouldn't be surprised if biasing plus a little bit of tail duplication gets most of the benefit of deoptimization.

sitkack · 2025-03-04T15:40:23 1741102823

Would you mind deep linking to the V8 code that does this?

sitkack · 2025-03-03T19:39:43 1741030783

Which JIT would be the easiest to implement to log this information? A time series LLM should be able to analyze it and give predictions.

Looks like PYPY is the most extensible.

https://rpython.readthedocs.io/en/latest/logging.html

And that the JIT is rebuilt from rpython, so it is fairly open to extension.

pizlonator · 2025-03-03T20:19:30 1741033170

I know exactly how I would do that to JavaScriptCore, but that’s maybe mostly due to the fact that I designed most of the bits you’d have to instrument.

Not sure if it’s the easiest overall.

I’m easy to look up if you want to pick my brain about JSC

sitkack · 2025-03-03T21:52:47 1741038767

What a generous offer. I'll spend some time reading your papers first. Thank you.