Very interesting, I had completely forgotten about these chips.
Just one question, is there possibly a typo in the table with the SRAM physical sizes for the density-optimised Power8 SRAM? At 6T density-optimised it's more than 7 times as large than the 8T not-density-optimised SRAM on the same chip? That sounds very weird.
This has been my favorite article here in months. A deep dive into a technological obscurity that was actually highly innovative and useful. I hadn't thought of what Intel needed to invest to get this eDRAM L4 to work. Not just getting a DRAM to yield, but to also efficiently roll it into the architecture to be helpful. It's crazy how little Intel ended up using all this work.
With its tile strategy moving forward, Intel could try dipping into the Vcache territory, although its still unclear how this could play well with their current trend of increasing the L2 cache sizes and the heterogenous core integration.
That's really interesting! I didn't know eDRAM could get L3 levels of latency. If you choose eDRAM as a system memory (M1/Lunar Lake), how much faster can it get than system memory while having large capacity? As far as I have looked, IBM's Z15 is the only one with large amounts of eDRAM but it isn't large enough to be system memory
With AMD offering 3D Cache, I suspect that Intel will have no choice but to deliver some sort of higher cache solution in the future, if only to remain relevant.
Perhaps they could move a large L4 cache with 3D stacking off to its own tile someday. Supposedly there was the cancelled Adamantine L4, so I'm not sure what will happen next.
Thanks for this article! After looking at the tiles that make up Arrow Lake, I don't think I was the only one who wondered if the larger of the two filler tiles couldn't be replaced with a much more modern version of Crystal Well. As you wrote, IBM has shown that eDRAM cache can be implemented with both low latencies and high throughput. And, Arrow Lake's larger of the two current filler tile is conveniently located right next to the compute tile. Intel needs to get creative (again) and come back with a real alternative to AMD's new 3D cache design for the 9800 Ryzen. Plus, would be a good dress-rehearsal for the next Xeons.
I suspect power is still a problem. DRAM destructive reads and refreshes cost power, and CPU performance is too often power limited. You would need enough eDRAM to catch a lot of L3 misses, and that's harder today with 32 or 36 MB L3 caches.
128 MB is 16x larger than the standard desktop 8 MB L3 caches of 2015. For the same increase over a 32 MB L3 today, you'd need 512 MB of L4. Perhaps Intel decided larger core structures and more aggressive OoO execution was a more practical answer. After all latency, not bandwidth, is usually the biggest concern for client applications. OoO execution helps hide latency.
Regardless, a L4 setup (with eDRAM no less) is nowhere near as efficient as a extended L3 setup. It can be potentially oodles more dense but SRAM will always beat DRAM to a pulp
Especially when AMD's L3 Vcache involves being on the same ring bus as the rest of the cores,
Intel has always had L3 on its own uncore clock domain for a long time now
The only way for intel to get close is to attempt Vcache but Intel being Intel does things because they can, not because they should...
"high capacity caching is a fun part of Intel’s history. It would be fun to see it return."
IIRC MLID or someone else mentioned that they tried it for Meteor/Arrow lake, but the Adamantine base tile was cancelled in development due to #problems
Very interesting, I had completely forgotten about these chips.
Just one question, is there possibly a typo in the table with the SRAM physical sizes for the density-optimised Power8 SRAM? At 6T density-optimised it's more than 7 times as large than the 8T not-density-optimised SRAM on the same chip? That sounds very weird.
Oops, yeah it should be 0.144 um2. I fixed it at https://old.chipsandcheese.com/2024/11/01/broadwells-edram-vcache-before-vcache-was-cool/
Bit harder to fix on Substack because they don't have table support
Really? No table support? That's bizarre.
Great article though! I have badly wanted an iris pro rig to add to my collection
Not a big deal - people can see the comment here in any case.
I have fixed the image here on Substack.
This has been my favorite article here in months. A deep dive into a technological obscurity that was actually highly innovative and useful. I hadn't thought of what Intel needed to invest to get this eDRAM L4 to work. Not just getting a DRAM to yield, but to also efficiently roll it into the architecture to be helpful. It's crazy how little Intel ended up using all this work.
With its tile strategy moving forward, Intel could try dipping into the Vcache territory, although its still unclear how this could play well with their current trend of increasing the L2 cache sizes and the heterogenous core integration.
That's really interesting! I didn't know eDRAM could get L3 levels of latency. If you choose eDRAM as a system memory (M1/Lunar Lake), how much faster can it get than system memory while having large capacity? As far as I have looked, IBM's Z15 is the only one with large amounts of eDRAM but it isn't large enough to be system memory
With AMD offering 3D Cache, I suspect that Intel will have no choice but to deliver some sort of higher cache solution in the future, if only to remain relevant.
Perhaps they could move a large L4 cache with 3D stacking off to its own tile someday. Supposedly there was the cancelled Adamantine L4, so I'm not sure what will happen next.
Thanks for this article! After looking at the tiles that make up Arrow Lake, I don't think I was the only one who wondered if the larger of the two filler tiles couldn't be replaced with a much more modern version of Crystal Well. As you wrote, IBM has shown that eDRAM cache can be implemented with both low latencies and high throughput. And, Arrow Lake's larger of the two current filler tile is conveniently located right next to the compute tile. Intel needs to get creative (again) and come back with a real alternative to AMD's new 3D cache design for the 9800 Ryzen. Plus, would be a good dress-rehearsal for the next Xeons.
I suspect power is still a problem. DRAM destructive reads and refreshes cost power, and CPU performance is too often power limited. You would need enough eDRAM to catch a lot of L3 misses, and that's harder today with 32 or 36 MB L3 caches.
128 MB is 16x larger than the standard desktop 8 MB L3 caches of 2015. For the same increase over a 32 MB L3 today, you'd need 512 MB of L4. Perhaps Intel decided larger core structures and more aggressive OoO execution was a more practical answer. After all latency, not bandwidth, is usually the biggest concern for client applications. OoO execution helps hide latency.
Regardless, a L4 setup (with eDRAM no less) is nowhere near as efficient as a extended L3 setup. It can be potentially oodles more dense but SRAM will always beat DRAM to a pulp
Especially when AMD's L3 Vcache involves being on the same ring bus as the rest of the cores,
Intel has always had L3 on its own uncore clock domain for a long time now
The only way for intel to get close is to attempt Vcache but Intel being Intel does things because they can, not because they should...
"high capacity caching is a fun part of Intel’s history. It would be fun to see it return."
IIRC MLID or someone else mentioned that they tried it for Meteor/Arrow lake, but the Adamantine base tile was cancelled in development due to #problems