Discussion about this post

User's avatar
Schrödinger's Cat's avatar

"554.roms is the worst offender, and makes X925 execute more than twice as many instructions compared to Zen 5."

It would be interesting to know how heavily that test is utilizing SVE2 and AVX. It seems like the main reason it needs so many more instructions could be simply due to its narrower vector width.

I'd be curious if the same test would report a different instruction rate on a Graviton 3 CPU, which has Neverse V1 cores with a 256-bit wide SVE implementation. Maybe ARM just can't keep trying to get by with 128-bit vectors, at a time when even Intel is going back to 512-bit.

Schrödinger's Cat's avatar

"That said, getting a high performance core is only one piece of the puzzle. Gaming workloads are very important in the consumer space, and benefit more from a strong memory subsystem than high core throughput."

Yes, and you compared a system with LPDDR5X against two desktop CPUs with regular DDR5 memory. LPDDR has an extra latency penalty, compared to regular DDR memory, because it must multiplex address and data over the same pins. This makes the GB10's rate-1 performance even more impressive, because it's paying the LPDDR latency penalty without getting any real benefits from the 256-bit data path.

14 more comments...

No posts

Ready for more?