2 Comments
User's avatar
Yukimasa Sugizaki's avatar

Thank you for your insightful post, as always!

As indicated in the PTX documentation (https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-mad ), IMAD and IMAD.WIDE means 32-bit × 32-bit → 32-bit/64-bit integer multiplication, resp.

According to Table 7 in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions , widening multiplication delivers only half the throughput of its non-widening counterpart, so the compiler appears to favor non-widening one for 32-bit address generation for shared space.

Expand full comment
David. Hellyx's avatar

Funny how the XTX has an FP64 equivalent to this massive and expensive GPU.

Expand full comment