One noteworthy aspect of SPEC_rate=N benchmarks is that they spawn N independent instances of the workload. So, they're shared-nothing and place (in most cases) somewhat unnatural stress on the cache, interconnect, and memory subsystems.
For workloads like compiling, that's completely fine. But, for workloads like rendering, it doesn't match how they'd normally scale on a N-core machine.
In an era where VMs and containers are practically de rigueur, on larger machines, I guess it's not so unnatural for multiple, intensive shared-nothing workloads to be running on a machine. Although, most VMs tend to have more than just one core, especially if they're provisioned to run things that need lots of compute and do scale.
Yea, so I haven't done much with multiple copies. One copy is good for evaluating a single core, and sometimes I use two copies to get an idea of SMT performance.
SPECspeed could in theory be a more realistic look at multithreaded performance scaling (as opposed to rate with multiple copies being a pure throughput test), but SPECspeed is messy to evaluate because it includes workloads that are not threaded at all. I'd probably have to pick a subset of workloads
A general comment on Benchmarks: I wonder just how much pressure the respective company/organisation behind Spec or Geekbench get from various CPU manufacturers. After all, those numbers often decide who sells how many CPUs, and at what price.
Two questions: I would be curious to see just how much of a "Plus" the changes in the Arrow Lake 270K Plus made in the inter-tile traffic speed and latency. A number of tests have found the "Core 7 270K+" to be faster than the Core 9 285K.
And about the speeds shown for the tile-to-tile data rates of the 285K: My first thought was "what, that's all ?" I would have expected speeds significantly above 6-8 GB/s.
Maybe the fine-tuning in the Plus models bumped those up?
Geekbench is made by Primate Labs, which is a privately-held company. On the other hand, SPEC is an industry consortium of about a hundred hardware & software vendors, as well as governmental and academic institutions (according to wikipedia). What that says about either's susceptibility to undue influence is unclear. I'd be curious to know more about SPEC's development process for new benchmarks, but not to the extent that I bothered to do any digging into the subject.
One interesting difference is that Geekbench typically tries to approximate a given application using what they consider to be some representative stand-in. In contrast, SPEC seems to use open source packages as more direct representations of certain workloads.
Regarding the die-to-die traffic on the 285K, that's just measuring the average rate from a single thread and the highest measured by this article seems to be 13.23 GB/s by Photo Filter. The article mentions SPEC2017's fotonik3d using more, but I couldn't find where that was measured on the 285K. When run on Core Ultra 7 155H (Meteor Lake) P-core, this article seemed to measure it at 28 GB/s: https://chipsandcheese.com/p/running-spec-cpu2017-at-chips-and-cheese
I would further add that it'd be pretty boring if most or all of these benchmarks were so memory-bound that they could drive such high die-to-die utilization, as that would probably mean rather low IPC and little stress on most of the cores' internal resources. Indeed, Redwood Cove manages only 0.87 IPC on fotonik3d! Not to mention that multi-thread scaling of such workloads would probably be quite poor.
I don't think it's worth thinking of undue influence, versus evaluating how good the suites are.
I don't want every test to be memory bound. But I think there's a blind spot in both of these suites (SPEC CPU2017, GB6) in memory bound, latency sensitive workloads like games.
I think SPEC2026 came out two days ago.
https://www.spec.org/cpu2026/
I wrote this a week ago and I don't have access to SPEC2026 (yet)
One noteworthy aspect of SPEC_rate=N benchmarks is that they spawn N independent instances of the workload. So, they're shared-nothing and place (in most cases) somewhat unnatural stress on the cache, interconnect, and memory subsystems.
For workloads like compiling, that's completely fine. But, for workloads like rendering, it doesn't match how they'd normally scale on a N-core machine.
In an era where VMs and containers are practically de rigueur, on larger machines, I guess it's not so unnatural for multiple, intensive shared-nothing workloads to be running on a machine. Although, most VMs tend to have more than just one core, especially if they're provisioned to run things that need lots of compute and do scale.
Yea, so I haven't done much with multiple copies. One copy is good for evaluating a single core, and sometimes I use two copies to get an idea of SMT performance.
SPECspeed could in theory be a more realistic look at multithreaded performance scaling (as opposed to rate with multiple copies being a pure throughput test), but SPECspeed is messy to evaluate because it includes workloads that are not threaded at all. I'd probably have to pick a subset of workloads
The "Object Detection" test makes heavy use of VNNI. On Skylake it gets about 3-4x slower than Rocket Lake, and also is slower than A76.
The "Photo Filter" test is also the same, but the slowdown is slightly less.
Is there any way to actually test this out? This is just my guess.
Thanks Chester!
A general comment on Benchmarks: I wonder just how much pressure the respective company/organisation behind Spec or Geekbench get from various CPU manufacturers. After all, those numbers often decide who sells how many CPUs, and at what price.
Two questions: I would be curious to see just how much of a "Plus" the changes in the Arrow Lake 270K Plus made in the inter-tile traffic speed and latency. A number of tests have found the "Core 7 270K+" to be faster than the Core 9 285K.
And about the speeds shown for the tile-to-tile data rates of the 285K: My first thought was "what, that's all ?" I would have expected speeds significantly above 6-8 GB/s.
Maybe the fine-tuning in the Plus models bumped those up?
Geekbench is made by Primate Labs, which is a privately-held company. On the other hand, SPEC is an industry consortium of about a hundred hardware & software vendors, as well as governmental and academic institutions (according to wikipedia). What that says about either's susceptibility to undue influence is unclear. I'd be curious to know more about SPEC's development process for new benchmarks, but not to the extent that I bothered to do any digging into the subject.
One interesting difference is that Geekbench typically tries to approximate a given application using what they consider to be some representative stand-in. In contrast, SPEC seems to use open source packages as more direct representations of certain workloads.
Regarding the die-to-die traffic on the 285K, that's just measuring the average rate from a single thread and the highest measured by this article seems to be 13.23 GB/s by Photo Filter. The article mentions SPEC2017's fotonik3d using more, but I couldn't find where that was measured on the 285K. When run on Core Ultra 7 155H (Meteor Lake) P-core, this article seemed to measure it at 28 GB/s: https://chipsandcheese.com/p/running-spec-cpu2017-at-chips-and-cheese
I would further add that it'd be pretty boring if most or all of these benchmarks were so memory-bound that they could drive such high die-to-die utilization, as that would probably mean rather low IPC and little stress on most of the cores' internal resources. Indeed, Redwood Cove manages only 0.87 IPC on fotonik3d! Not to mention that multi-thread scaling of such workloads would probably be quite poor.
I don't think it's worth thinking of undue influence, versus evaluating how good the suites are.
I don't want every test to be memory bound. But I think there's a blind spot in both of these suites (SPEC CPU2017, GB6) in memory bound, latency sensitive workloads like games.