In my past articles on GPUs, I didn’t have good measurements for L1 cache bandwidth.
mind sharing the repo for your benchmarks?
mind sharing the repo for your benchmarks?