Discussion about this post

User's avatar
jozsef's avatar

Thank you very much!

It's good to see, there is someone who pays appropriate attention to analyze hardwares in depth.

Expand full comment
David. Hellyx's avatar

Interesting observation...

"CP is very slow on GFX12 and parsing the packet header is the main bottleneck. Using paired context regs reduce the number of packet headers and it should be more optimal.

It doesn't seem worth when only one context reg is emitted (one packet header and same number of DWORDS) or when consecutive context regs are emitted (would increase the number of DWORDS)."

https://www.phoronix.com/news/AMD-RDNA4-Paired-Context-Regs

Expand full comment
13 more comments...

No posts