One noteworthy aspect of SPEC_rate=N benchmarks is that they spawn N independent instances of the workload. So, they're shared-nothing and place (in most cases) somewhat unnatural stress on the cache, interconnect, and memory subsystems.
For workloads like compiling, that's completely fine. But, for workloads like rendering, it doesn't match how they'd normally scale on a N-core machine.
In an era where VMs and containers are practically de rigueur, on larger machines, I guess it's not so unnatural for multiple, intensive shared-nothing workloads to be running on a machine. Although, most VMs tend to have more than just one core, especially if they're provisioned to run things that need lots of compute and do scale.
Yea, so I haven't done much with multiple copies. One copy is good for evaluating a single core, and sometimes I use two copies to get an idea of SMT performance.
SPECspeed could in theory be a more realistic look at multithreaded performance scaling (as opposed to rate with multiple copies being a pure throughput test), but SPECspeed is messy to evaluate because it includes workloads that are not threaded at all. I'd probably have to pick a subset of workloads
I think SPEC2026 came out two days ago.
https://www.spec.org/cpu2026/
I wrote this a week ago and I don't have access to SPEC2026 (yet)
One noteworthy aspect of SPEC_rate=N benchmarks is that they spawn N independent instances of the workload. So, they're shared-nothing and place (in most cases) somewhat unnatural stress on the cache, interconnect, and memory subsystems.
For workloads like compiling, that's completely fine. But, for workloads like rendering, it doesn't match how they'd normally scale on a N-core machine.
In an era where VMs and containers are practically de rigueur, on larger machines, I guess it's not so unnatural for multiple, intensive shared-nothing workloads to be running on a machine. Although, most VMs tend to have more than just one core, especially if they're provisioned to run things that need lots of compute and do scale.
Yea, so I haven't done much with multiple copies. One copy is good for evaluating a single core, and sometimes I use two copies to get an idea of SMT performance.
SPECspeed could in theory be a more realistic look at multithreaded performance scaling (as opposed to rate with multiple copies being a pure throughput test), but SPECspeed is messy to evaluate because it includes workloads that are not threaded at all. I'd probably have to pick a subset of workloads
The "Object Detection" test makes heavy use of VNNI. On Skylake it gets about 3-4x slower than Rocket Lake, and also is slower than A76.
The "Photo Filter" test is also the same, but the slowdown is slightly less.
Is there any way to actually test this out? This is just my guess.