I think this article overstates the importance of the problems even for scientific software. In the scientific code I've written, noise processes are often orders of magnitude larger than what what is discussed here and I believe this applies to many (most?) simulations modelling the real world (i.e. Physics chemistry,..). At the same time enabling fast-math has often yielded a very significant (>10%) performance boost.
I particularly find the discussion of - fassociative-math because I assume that most writers of some code that translates a mathetical formula to into simulations will not know which would be the most accurate order of operations and will simply codify their derivation of the equation to be simulated (which could have operations in any order). So if this switch changes your results it probably means that you should have a long hard look at the equations you're simulating and which ordering will give you the most correct results.
That said I appreciate that the considerations might be quite different for libraries and in particular simulations for mathematics.
I thought most languages have this? If you simply write a formula operations are ordered according to the language specifiction. If you want different ordering you use parentheses.
Not sure how that interacts with this fast math thing, I don't use C
Imagine a function like Python’s `sum(list)`. In abstract, Python should be able to add those values in any order it wants. Maybe it could spawn a thread so that one process sums the first half in the list, another sums the second half at the same time, and then you return the sum of those intermediate values. You could imagine a clever `sum()` being many times faster, especially using SIMD instructions or a GPU or something.
But alas, you can’t optimize like that with common IEEE-754 floats and expect to get the same answer out as when using the simple one-at-a-time addition. The result depends on what order you add the numbers together. Order them differently and you very well may get a different answer.
That’s the kind of ordering we’re talking about here.
"precision" is an ambiguous term here. There's reproducibility (getting the same results every time), accuracy (getting as close as possible to same results computed with infinite precision), and the native format precision.
ffast-math is sacrificing both the first and the second for performance. Compilers usually sacrifice the first for the second by default with things like automation fma contraction. This isn't a necessary trade-off, it's just easier.
There's very few cases where you actually need accuracy down to the ULP though. No robot can do anything meaningful with femtometer+ precision, for example. Instead you choose a development balance between reproducibility (relatively easy) and accuracy (extremely hard). In robotics, that will usually swing a bit towards reproducibility. CAD would swing more towards accuracy.
Interesting, I stand corrected. In most of the fields I'm aware off one could easily work in 32bit without any issues.
I find the robotics example quite surprising in particular. I think the precision of most input sensors is less than 16bit so. If your inputs have this much noise on them how come you need so much precision your calculations?
The precision isn't uniform across a range of possible inputs. This means you need a higher bit depth, even though "you aren't really using it", just so you can establish a good base precision you are sure you are hitting at every range. The part where you are saying "most sensors" is doing a lot of leverage here.
It matters for reproducibility between software versions, right?
I work in audio software and we have some comparison tests that compare the audio output of a chain of audio effects with a previous result. If we make some small refactoring of the code and the compiler decides to re-organize the arithmetic operations then we might suddenly get a slightly different output. So of course we disable fast-math.
One thing we do enable though, is flushing denormals to zero. That is predictable behavior and it saves some execution time.
Yeah that is the killer for me. I'm not particularly attached to IEEE semantics. Unfortunately the replacement is that your results can change between any two compiles, for nearly any reason. Even if you think you don't care about tiny precision variances: consider that if you ever score and rank things with an algorithm that involves floats, the resulting order can change.
I particularly find the discussion of - fassociative-math because I assume that most writers of some code that translates a mathetical formula to into simulations will not know which would be the most accurate order of operations and will simply codify their derivation of the equation to be simulated (which could have operations in any order). So if this switch changes your results it probably means that you should have a long hard look at the equations you're simulating and which ordering will give you the most correct results.
That said I appreciate that the considerations might be quite different for libraries and in particular simulations for mathematics.