Microbenchmarking AMD’s RDNA 3 Graphics…

Jan 7, 2023

Editor’s Note (6/14/2023): We have a new article that reevaluates the cache latency of Navi 31, so please refer to that article for some new latency data.

Read →

4 Comments

Mat

Dec 11, 2024

What's up with the dismally poor dual issue rate? Back in the VLIW5 days, AMD had an average packing rate of 3.4. With RDNA3, I get the impression using that second ALU is like shooting for the moon. The four read ports would restrict dual FMA operations, but I assume it's more than that.

Expand full comment

Reply (1)

Chester Lam

Dec 11, 2024

Keep in mind dual issue is only required for wave32 mode. In wave64 mode it'll naturally use the second ALU. Compiler optimization is an issue for wave32 mode, but another way of looking at it is, it can situationally reduce instruction dispatch pressure in very compute bound kernels, and probably has low hardware cost.

Expand full comment

Reply (1)

Mat

Dec 11, 2024Edited

Wave 64 removes any dependency issues, but I think the 4 read ports would still be an issue for FMA operations? What I really don't get is why not use wave 64 for everything? Less efficient but up to twice the work done compared to wave 32 (if that second ALU is rarely getting used in wave32).

Expand full comment

Reply (1)

Chester Lam

Dec 11, 2024

Wave32 is useful for reducing divergence costs and latency within a single thread, so it's great for smaller, less compute bound dispatches like vertex shaders. For more parallel workloads like pixel shaders, AMD likes to use wave64

Expand full comment

Chips and Cheese

Microbenchmarking AMD’s RDNA 3 Graphics…