That doesn’t look apples-to-apples to me. That’s thousands of samples of a two year old low-end M1 vs 2 samples of a brand new Ryzen mid-range (AFAICT). (And the Ryzen still loses at single core performance.)
It's an apples-to-apples comparison when it comes to the node process. AMD's latest CPUs are made on 5nm and 4nm nodes, something that Apple was only able to do with the M1/M2 because they booked all of TSMC's 5nm node capacity.
It's only recently that other companies like AMD are able to use TSMC's 5nm node process.
Missed opportunity to call it apples-to-Apples ;-)
But I do wonder, given the other comments here about TDP and these days of thermally-limited performance, what the results are if both are locked to the same constant frequency.
Nah, besides crazy speculative behaviour, automatic overcloking is how modern chips are so fast compared to a few years ago.
And usually for battery power, it's often better to run really hot for a small ammount of time, that to run for a extended ammount of time at lower clocks.
Rembrandt is excellent... on linux, if you throttle it.
The main issue is that AMD/Intel turbo so hard, while Apple clocks their M chips much more conservatively. They are also much bigger, wider designs than AMD (which means they are more expensive but can afford to run slower).
Another is that Windows + OEM garbage + random background apps do so much useless processing in the background. And I'm not even a pro-linux "bloat" zealot... it really is just senseless and unacceptable out-of-the-box.
> Another is that Windows + OEM garbage + random background apps do so much useless processing in the background. And I'm not even a pro-linux "bloat" zealot... it really is just senseless and unacceptable out-of-the-box.
Modern MacOS is nearly as bad. I upgraded a couple of years ago from a dual core 2016 macbook pro. The machine - even freshly formatted - spent an obscene amount of CPU time doing useless things - like in photoanalysisd (presumably looking for my face in my iphoto library for the 100th time). Or indexing my hard drive again, for no reason.
The efficiency cores in my new M1 machine seem to hover at about 50% most of the time I'm using the computer. I've started to think of them as the silly corner for Apple's bored software engineers to play around in, so the random background processes they start don't get in the way of getting actual work done.
I wish I could figure out how to turn all this crap off. Its no wonder linux on M1 chips is already benchmarking better than the same machines running MacOS, at least on CPU bound tasks.
(That said, OEM bloatware on windows is a whole other level of hurt.)
On the other hand, a low frequency efficiency core is a good place for "bloat" to live. I think thats how Android/iOS remain usable too.
Windows bloat on AMD runs on the big 4Ghz+ cores. And I suspect it does on Intel laptops with E cores too, as Windows isn't integrated enough to know that the Adobe updater and Norton Antivirus and the HP App Store are E core tasks. And even if it does, Intel runs their E cores faster than Apple anyway.
The advantage there is that Apple knows exactly what HW is running on and can take advantage of every power save opportunity, while on x86 that's much harder.
M can be wider because it’s easy to decode ARM in parallel. X86 parallel decode becomes exponentially harder with more width due to crazy instruction length rules.
Strongly agree. It's not moving the goalposts when the metric is useless. TDP means nothing nowadays because CPUs can significantly exceed them when turboing if they've got thermal headroom.
IMO, real power consumption in joules over the course of a benchmark needs to be the standard when it comes to comparing efficiency.
I wish this was a more common benchmark for graphics cards - with newer graphics cards pushing higher and higher TDPs, it would be nice to have a way to look for "best performance while keeping power draw the same as the previous GPU".
Makes me wonder what the highest perf would look like out of an arbitrary hypothetical multi-socket Apple Silicon system, vs an arbitrary multi-socket x86 system; where the only constraints for both systems are that the boards have a fixed power budget. (I.e. "who wins in a map/reduce task: a board that spends 1000W to power 4 Xeons, or a board that spends 1000W to power 20 M2 Ultras?")
Too bad there are no bare/unsoldered Apple Silicon chips to try building such boards around. I'm sure, if there were, you'd find them all over AliExpress.
I'd also be curious which of those two boards would have a higher BOM!
You could probably run a QDR+QSFP Infiniband card at around 32Gbps (minus overhead) through an external "GPU" enclosure. I don't see why MPI wouldn't work on Asahi Linux with such a setup once there's Thunderbolt support.
QDR Infiniband is, like, 2007 level tech. Today we have NDR Infiniband where a typical 4xHCA gets you 400 GB/s. Seems like a hypothetical Mac cluster would be severely limited by this compared to the typical x86 based server clusters.
I’m sure such a switch would be serving some pretty beefy nodes, though, right? Maybe the compute:communication can be held constant with less-powerful Mac mini nodes?
It is apparently possible to do networking over some Thunderbolt interfaces, would it be possible to connect the devices over Thunderbolt to one another directly? Four ports each, so form a mesh! I guess TB4 can go up to 40Gbps, although it sounds like there’s a bit of overhead when using it as a network, and also I have no idea if there’s some hub-like bottleneck inside the chip…
Newer mobile AMD APUs get close to the M1 Pro's power usage while exceeding the M1's performance[1]. Those same APUs get even closer when compared to the M2 Pro[2].
Isn't the M2 Pro more power efficient than the M1 Pro? At least it is according to [1]. So isn't it further away compared to the M2 Pro, not closer? Or are you saying that the M2 Pro is closer in performance to the Ryzen 9 than the M1 Pro, rather than in power usage?
I'm not really knowledgeable when it comes to this, so perhaps I'm missing something.
Other ARM SoC vendors do, absolutely, which is a big factor in why most other ARM SoCs are so far behind Apple's. But Intel & AMD less so, they tend to prioritize just outright performance since that's how they're nearly always compared & judged. Die size hasn't really been a constraint for them.
Yep. And its a close race between Apple's ARM chips and the latest x86 chips from Intel and AMD. If GeekBench is to be believed, Apple's best chips are only about 10-15% behind the performance of the top x86 desktop class CPUs, despite only using a fraction of the power.
Apple's M_ Max CPU variants come with a very hefty price tag though.
> Apple's best chips are only about 10-15% behind the performance of the top x86 desktop class CPUs, despite only using a fraction of the power.
Power consumption scales non-linearly with clock speed. So you're comparing two variables that are dependent on each other. If you want a meaningful comparison, you have to align one of those variables. As in, either reduce x86 to M2 Pro/Ultra/Whatever's power budget and then compare performance, or align performance and then compare power.
This is especially true for the desktop class CPUs where outright performance is the name of the game at all costs. AMD & Intel are constantly throwing upwards of 50w at an extra 5% performance, because that's what drives sales - outright performance.
It’s a mix of a simpler ISA, good core design, and small process nodes.