Thanks for this article! After looking at the tiles that make up Arrow Lake, I don't think I was the only one who wondered if the larger of the two filler tiles couldn't be replaced with a much more modern version of Crystal Well. As you wrote, IBM has shown that eDRAM cache can be implemented with both low latencies and high throughput.…
Thanks for this article! After looking at the tiles that make up Arrow Lake, I don't think I was the only one who wondered if the larger of the two filler tiles couldn't be replaced with a much more modern version of Crystal Well. As you wrote, IBM has shown that eDRAM cache can be implemented with both low latencies and high throughput. And, Arrow Lake's larger of the two current filler tile is conveniently located right next to the compute tile. Intel needs to get creative (again) and come back with a real alternative to AMD's new 3D cache design for the 9800 Ryzen. Plus, would be a good dress-rehearsal for the next Xeons.
I suspect power is still a problem. DRAM destructive reads and refreshes cost power, and CPU performance is too often power limited. You would need enough eDRAM to catch a lot of L3 misses, and that's harder today with 32 or 36 MB L3 caches.
128 MB is 16x larger than the standard desktop 8 MB L3 caches of 2015. For the same increase over a 32 MB L3 today, you'd need 512 MB of L4. Perhaps Intel decided larger core structures and more aggressive OoO execution was a more practical answer. After all latency, not bandwidth, is usually the biggest concern for client applications. OoO execution helps hide latency.
Regardless, a L4 setup (with eDRAM no less) is nowhere near as efficient as a extended L3 setup. It can be potentially oodles more dense but SRAM will always beat DRAM to a pulp
Especially when AMD's L3 Vcache involves being on the same ring bus as the rest of the cores,
Intel has always had L3 on its own uncore clock domain for a long time now
The only way for intel to get close is to attempt Vcache but Intel being Intel does things because they can, not because they should...
Thanks for this article! After looking at the tiles that make up Arrow Lake, I don't think I was the only one who wondered if the larger of the two filler tiles couldn't be replaced with a much more modern version of Crystal Well. As you wrote, IBM has shown that eDRAM cache can be implemented with both low latencies and high throughput. And, Arrow Lake's larger of the two current filler tile is conveniently located right next to the compute tile. Intel needs to get creative (again) and come back with a real alternative to AMD's new 3D cache design for the 9800 Ryzen. Plus, would be a good dress-rehearsal for the next Xeons.
I suspect power is still a problem. DRAM destructive reads and refreshes cost power, and CPU performance is too often power limited. You would need enough eDRAM to catch a lot of L3 misses, and that's harder today with 32 or 36 MB L3 caches.
128 MB is 16x larger than the standard desktop 8 MB L3 caches of 2015. For the same increase over a 32 MB L3 today, you'd need 512 MB of L4. Perhaps Intel decided larger core structures and more aggressive OoO execution was a more practical answer. After all latency, not bandwidth, is usually the biggest concern for client applications. OoO execution helps hide latency.
Regardless, a L4 setup (with eDRAM no less) is nowhere near as efficient as a extended L3 setup. It can be potentially oodles more dense but SRAM will always beat DRAM to a pulp
Especially when AMD's L3 Vcache involves being on the same ring bus as the rest of the cores,
Intel has always had L3 on its own uncore clock domain for a long time now
The only way for intel to get close is to attempt Vcache but Intel being Intel does things because they can, not because they should...