If i understand correctly, this is not a traditional cpu like out of order resource, because it doesnt exploit instruction level parallelism, only inter warps memory parallelism.
I'd like to see more rdna 4 architecture analysis, because IMHO rdna 4 is the most interesting gpu architecture since gcn. I am curiously interested in rdna 4 dynamic register allocation and out of order capapilities. Especially dynamic register allocation from software perspective, thinking about deadlocks, what are mentioned in rdna 4 instruction set architecture pdf.
could you put charts up that compare it directly to the following dies N33,N32,N31 and N21 ?
N33 - the only monolithic big RDNA3 GPU (Except Viola but that is on N4P and on a Platform (PS5 Pro) where you cant do micro benchmarking and analysis)
N32 - the revised version of N31 (smaller caches per WGP and Array than N31), and overall the roughly the same size as N48
N31 - specifically the 7900GRE as it has almost the same number of transistors.
N21 - 2x the L3 cahce, monolithic and just for an overall overview on how RDNA developed over the course of the last 3 gens.
most interestingly would be N33 or N32 vs N48 when it comes to the caches
Could you analyze this out of order memory more?
If i understand correctly, this is not a traditional cpu like out of order resource, because it doesnt exploit instruction level parallelism, only inter warps memory parallelism.
Do I see it right?
I'd love to see a detailed analysis of architectures between RDNA3 vs RDNA4 vs ADA vs Blackwell
I'd like to see more rdna 4 architecture analysis, because IMHO rdna 4 is the most interesting gpu architecture since gcn. I am curiously interested in rdna 4 dynamic register allocation and out of order capapilities. Especially dynamic register allocation from software perspective, thinking about deadlocks, what are mentioned in rdna 4 instruction set architecture pdf.
Thanks in advance!
could you put charts up that compare it directly to the following dies N33,N32,N31 and N21 ?
N33 - the only monolithic big RDNA3 GPU (Except Viola but that is on N4P and on a Platform (PS5 Pro) where you cant do micro benchmarking and analysis)
N32 - the revised version of N31 (smaller caches per WGP and Array than N31), and overall the roughly the same size as N48
N31 - specifically the 7900GRE as it has almost the same number of transistors.
N21 - 2x the L3 cahce, monolithic and just for an overall overview on how RDNA developed over the course of the last 3 gens.
most interestingly would be N33 or N32 vs N48 when it comes to the caches
So, do i see it wrong, or this out of order memory does not exploit instruction level parallelism within a thread? Anyone an answer???