While the Ryzen and 9355P provide nice context, I don't understand why there's no comparison between NSP0, 1, 2, 3, 4 and L3 as NUMA domain all on the same EPYC 9575F.
Interesting none the less and I expect the B200 results will be even more so.
Sounds like they were given access to a VM running on that hardware, not access to the physical hardware. Hard to test what you don't have access to. :)
The 220ns latency hit compared to NPS1 mode is brutal, but I was suprised how wel it still performs in single-threaded SPEC runs. The caching setup must be doing some serious work to offset that penalty. Would love to see comparisons between NPS2/4 modes to understand the latency tradeofs better?
While the Ryzen and 9355P provide nice context, I don't understand why there's no comparison between NSP0, 1, 2, 3, 4 and L3 as NUMA domain all on the same EPYC 9575F.
Interesting none the less and I expect the B200 results will be even more so.
Sounds like they were given access to a VM running on that hardware, not access to the physical hardware. Hard to test what you don't have access to. :)
The 220ns latency hit compared to NPS1 mode is brutal, but I was suprised how wel it still performs in single-threaded SPEC runs. The caching setup must be doing some serious work to offset that penalty. Would love to see comparisons between NPS2/4 modes to understand the latency tradeofs better?