Read in the Substack app
Open app

Discussion about this post

User's avatar
Yukimasa Sugizaki's avatar

Thank you for your insightful post, as always!

As indicated in the PTX documentation (https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-mad ), IMAD and IMAD.WIDE means 32-bit × 32-bit → 32-bit/64-bit integer multiplication, resp.

According to Table 7 in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions , widening multiplication delivers only half the throughput of its non-widening counterpart, so the compiler appears to favor non-widening one for 32-bit address generation for shared space.

Expand full comment
David. Hellyx's avatar

Funny how the XTX has an FP64 equivalent to this massive and expensive GPU.

Expand full comment

No posts