AMD’s CDNA 4 Architecture Announcement

Jun 17

CDNA 4 is AMD’s latest compute oriented GPU architecture, and represents a modest update over CDNA 3.

10 Comments

Would the out-of-order memory access capability that AMD added to RDNA4 have any applications in a CDNA4-based design? I know UDNA is rumored to represent a fusion of CDNA and RDNA, implying that some design ideas will be kept from each. I know RDNA4's OOO abilities are unique, but I don't know if there would even be a theoretical benefit to adopting a similar approach for AI or HPC workloads.

Expand full comment

Reply (2)

Chester Lam

Jun 29

Depends on whether AMD expects compute workloads to frequently have different waves going down different code paths with varying cache hit/miss behavior. I suspect a lot of compute workloads, especially ML ones, will be quite regular and won't see too much benefit from the "OOO" memory accesses.

Also RDNA4's OOO memory accesses are not unique (https://www.bing.com/search?pc=MOZI&form=MOZLBR&q=chipsandcheese+rdna4). It's resolving a false dependency between different waves when waiting for data to arrive from the memory subsystem. Nvidia and Intel don't have this false dependency even on older GPUs, so they always had "OOO" memory accesses.

Expand full comment

Reply (1)

Joel Hruska

Jun 30

Thank you for replying, Chester.

I stand corrected. After re-reviewing this article, I think the only explanation is that I paused reading the RDNA4 article partway through and then forgot I had done so. You clearly lay out that Nvidia and Intel GPUs are not affected by the same false dependency issue.

Apologies.

Expand full comment

Farfle

Jun 29

Miss you at ExtremeTech, Joel :-)

Expand full comment

Reply (1)

Joel Hruska

Jul 2

It was a great place, with great people.

Expand full comment

Jun 18Edited

I wonder if the FP6 advantage will be maintained against too, rubin.

Expand full comment

Gael

Aug 25

Hello,

Is the number of Ops/cycle of the B200 of the Dense or Sparsity type?

Thanks :)

Expand full comment

ET3D

Jun 18

I don't understand the stochastic rounding note. For HPC I think that stochastic rounding could be very helpful in making smaller data types usable, but that would make sense for math operations or conversion to lower precision. Can anyone explain what stochastic rounding means when increasing precision?

Expand full comment