Running Gaming Workloads through AMD’s Zen 5

Aug 2

Zen 5 is AMD’s newest core architecture.

14 Comments

Quite fascinating. Before Zen 5 launched, various sources, including articles here, seemed to at least vaguely imply that AMD's "two-ahead" branch predictor would be able to follow two branches per cycle even for a single thread, whereas post-launch it quickly became clear that that wasn't the case, and also that, as reiterated by this article, the op-cache only seems to be able to deliver six ops per cycle for one thread, which seems a bit at odds with the 8-wide renamer.

All taken together, I can't help but wonder if there wasn't something that turned out badly with Zen 5's front-end at a late stage, and they were forced to neuter it to prevent bugs. If true, and they manage to fix those problems with Zen 6, that could paint quite a positive picture for Zen 6 IPC improvements, not least coupled with the rumors that Zen 6 is using a new, lower-latency die-to-die interconnect (which they're already kind of using for Strix Halo, aren't they?).

Expand full comment

Eric Olson

Aug 4

Is there something intrinsic to video games that lead low IPC computations or does low IPC simply follow from lack of optimisation at the software development level?

Also, since Intel is backend latency constrained while AMD is front-end latency constrained, does that mean code needs to be optimised in different ways depending on which processor it will run on?

Expand full comment

Reply (3)

Sunaabh Trivedi

Aug 4

I assume games in general are pretty low IPC just because of how branch-y + control flow dependent they are? I can imagine its hard to have much ILP in game logic

Expand full comment

Fredrik Tolf

Aug 6Edited

In my experience, most programs are kind of low-IPC "by default", and it's only really the ones with very regular instruction and data access patterns that achieve particularly high IPC. That's not a systematic and rigorous statement, just my experience from running `perf stat` on various different kinds of programs. Most "normal" programs in this sense generally seem to hit somewhere between 1-2 instructions per clock.

Expand full comment

David. Hellyx

Aug 5

There's a lot of scope for optimization in games, that's a fact.

Expand full comment

Reply (1)

Schrödinger's Cat

Aug 19

If they're blowing out of instruction cache, one possible explanation could be too much inlining and loop-unrolling. In a sense, some of these games could actually be over-optimized.

Code size vs. straight-line speed is a very difficult tradeoff to make. Game programmers will nearly always prefer straight-line speed, even when there's not a lot to be gained by doing so.

Expand full comment

Erik Stubblebine

Aug 3

Thank you for another interesting write-up. It seems AMD has room to squeeze a little more performance out of its Zen 5.

Expand full comment

StoykovK

Nov 21

I'm wondering whether disabling the SMT will give a little bit more performance. I would like to see the same tests with SMT disabled. At least the OOO resources will used by single thread rather than shared.

Expand full comment

Finn B

Aug 20

Excellent write-up. Would love to see Factorio in future gaming benchmarks because it is unlike many other games due to its very high sensitivity to cache and relative lack of graphic intensity.

Expand full comment

Schrödinger's Cat

Aug 19

Do current branch predictors try to predict the confidence associated with the most likely outcome? If so, that could open the door to another dimension of optimizations, which is intelligently deciding whether to prefetch the less likely branch target.

Also, APX adds predication to many instructions, as a way to avoid cluttering up the branch predictor state. So, that could be another avenue where we might anticipate improvements on these sorts of low-IPC workloads.

Expand full comment

Reply (1)

Chester Lam

Aug 19

I have not heard of that, besides schemes that use multiple predictors and a meta-predictor to track which sub-predictor has been doing better.

Expand full comment

10GHz

Aug 15

Are you going to run the same IPC gaming tests on your 7950x3d? It's be interesting to see IPC limits given the big changes from Zen4 to Zen5.

Expand full comment

Dave

Aug 6

If the L1 and L2 cache hit rates are already good, and the main limit to performance is low IPC, then how does the larger L3 cache on the X3D chips boost gaming performance? Given the popularity of X3D for gaming, it would be great to see a similar article explaining why X3D is so beneficial for gaming, but does little for most other workloads.

Thanks for the interesting article!

Expand full comment

Reply (1)

Alexander Daum

Aug 7

Low IPC just means the core doesn't manage to execute many instructions per cycle, which can have different root causes.

The benchmarks in this article show, that the core is often frontend latency bound, meaning it is waiting on instructions. As that can be caused by I-cache misses, I would guess more L3 reduces the average duration of a stall at the frontend.

Expand full comment

Chips and Cheese

Running Gaming Workloads through AMD’s Zen 5