As much as I love RDNA4’s ray tracing improvements, I’ve been reading about them for two weeks straight and I’m getting SO THIRSTY for some new C&C content!!!
"Transforming a ray would involve multiplying both the origin and direction vector by a 3×3 rotation matrix, which naively requires 36 FLOPs (Floating Point Operations) per transform."
There's an error in your calculations. A naive 3x3 matrix-vector multiplication without FMA requires 15 operations, and with FMA it reduces down to 9 operations.
To multiply a matrix by a vector, you multiply corresponding elements in each row with the vector. For a 3x3 matrix, that's 3 multiply-accumulates per row, or 9 FMA operations. Each FMA operation is two FLOPS, because it's a multiply and an add, so 18 FLOPS per matrix multiply.
Now you have to rotate both the origin and direction vector, not just one of them, so 18*2=36 FLOPS
Do you think current games are effectively utilizing RDNA4's new ray tracing–focused features? Some titles seem so poorly optimized that RDNA3 and RDNA4 perform almost the same. Indiana Jones and Wukong are the two worst offenders.
Love the article, but find it a bit disheartening how little NVidia is mentioned with their approach in comparison. Even Intel got more attention. Is this caused by lack of more solid info about NV approach or?..
As much as I love RDNA4’s ray tracing improvements, I’ve been reading about them for two weeks straight and I’m getting SO THIRSTY for some new C&C content!!!
Actual work has been pretty busy and stressful over the past couple weeks. There'll be more stuff soon though.
"Transforming a ray would involve multiplying both the origin and direction vector by a 3×3 rotation matrix, which naively requires 36 FLOPs (Floating Point Operations) per transform."
There's an error in your calculations. A naive 3x3 matrix-vector multiplication without FMA requires 15 operations, and with FMA it reduces down to 9 operations.
To multiply a matrix by a vector, you multiply corresponding elements in each row with the vector. For a 3x3 matrix, that's 3 multiply-accumulates per row, or 9 FMA operations. Each FMA operation is two FLOPS, because it's a multiply and an add, so 18 FLOPS per matrix multiply.
Now you have to rotate both the origin and direction vector, not just one of them, so 18*2=36 FLOPS
Right. I guess that he counted one addition per multiplication, instead of 2 additions per 3 multiplications. Should have been 30, not 36.
Do you think current games are effectively utilizing RDNA4's new ray tracing–focused features? Some titles seem so poorly optimized that RDNA3 and RDNA4 perform almost the same. Indiana Jones and Wukong are the two worst offenders.
Love the article, but find it a bit disheartening how little NVidia is mentioned with their approach in comparison. Even Intel got more attention. Is this caused by lack of more solid info about NV approach or?..
yea lack of info
Dam, really sad. Still, was great to read about AMD approach, even in somewhat isolation!
Great article (as always). I enjoyed the deep dive.
Note that the Final Words section starts with "RDNA 2 brought introduced AMD’s ...".
fixed
I’m not saying RT is worthless, but I tried out the Half Life RTX demo on Steam and I can’t say I was all that impressed.
Maybe it was because I was playing such an old game that I was so familiar with, but the RT just didn’t do anything for me.
I’ll try not to be old and curmudgeonly, but I video gameplay is always more substantive than video game gfx