Discussion about this post

User's avatar
Fredrik Tolf's avatar

I'm interested in how much a fairly light AVX-512 workload (say, a small but highly optimized loop that just runs for a couple of microseconds) affects core behavior. If I recall correctly, a common criticism with Skylake-X was that the core dropped completely as soon as *any* AVX-512 instructions were executed, causing many developers to just avoid AVX-512 completely since it left their code running actively worse than AVX2 code that should nominally be slower.

The "rapid switching" graph seems to indicate that it shouldn't be nearly as big of a deal on Zen 5, since the core at least seems to recover immediately when small-ish AVX-512 sequences end, but it does also clearly shows IPC throttling immediately , but this could of course just be due to measurement granularity. Do you think there might be a "maximum size" of AVX-512 workload that would cause the core not to throttle at all?

Expand full comment
Adenilson Cavalcanti's avatar

Pretty awesome article, thanks a lot for sharing it!

A couple questions:

a) Is the same behavior expected in EPYC 5th gen processors?

b) I wonder what would be the results for Xeon 6th gen (e.g. Granite Rapids)?

c) was the code used for the profiling published somewhere in github?

Overall, excellent article. Pure class, as expected from you guys.

:-)

Expand full comment
13 more comments...

No posts