Broadwell’s eDRAM: VCache before VCache was…

Chester Lam

Nov 1, 2024

Up to Haswell’s 2013 release, Intel’s “tick-tock” strategy seemed unstoppable.

Read →

14 Comments

Catpig

Nov 1, 2024

Very interesting, I had completely forgotten about these chips.

Just one question, is there possibly a typo in the table with the SRAM physical sizes for the density-optimised Power8 SRAM? At 6T density-optimised it's more than 7 times as large than the 8T not-density-optimised SRAM on the same chip? That sounds very weird.

Reply (1)

Chester Lam

Nov 1, 2024

Oops, yeah it should be 0.144 um2. I fixed it at https://old.chipsandcheese.com/2024/11/01/broadwells-edram-vcache-before-vcache-was-cool/

Bit harder to fix on Substack because they don't have table support

Reply (2)

A Guy

Nov 3, 2024

Really? No table support? That's bizarre.

Great article though! I have badly wanted an iris pro rig to add to my collection

Catpig

Nov 1, 2024

Not a big deal - people can see the comment here in any case.

Reply (1)

George Cozma

Nov 2, 2024

I have fixed the image here on Substack.

RaisedAir

Jul 18, 2025

Under the scenario of using a dedicated GPU for rendering and an integrated GPU for video output, how does the system allocate eDRAM? Does it prioritize the CPU as L4 cache or the integrated GPU as video memory?

10GHz

Nov 17, 2024

This has been my favorite article here in months. A deep dive into a technological obscurity that was actually highly innovative and useful. I hadn't thought of what Intel needed to invest to get this eDRAM L4 to work. Not just getting a DRAM to yield, but to also efficiently roll it into the architecture to be helpful. It's crazy how little Intel ended up using all this work.

Ivan

Nov 5, 2024

With its tile strategy moving forward, Intel could try dipping into the Vcache territory, although its still unclear how this could play well with their current trend of increasing the L2 cache sizes and the heterogenous core integration.

CKing123

Nov 5, 2024Edited

That's really interesting! I didn't know eDRAM could get L3 levels of latency. If you choose eDRAM as a system memory (M1/Lunar Lake), how much faster can it get than system memory while having large capacity? As far as I have looked, IBM's Z15 is the only one with large amounts of eDRAM but it isn't large enough to be system memory

CrazyElf1

Nov 4, 2024Edited

With AMD offering 3D Cache, I suspect that Intel will have no choice but to deliver some sort of higher cache solution in the future, if only to remain relevant.

Perhaps they could move a large L4 cache with 3D stacking off to its own tile someday. Supposedly there was the cancelled Adamantine L4, so I'm not sure what will happen next.

Peter

Nov 1, 2024

Thanks for this article! After looking at the tiles that make up Arrow Lake, I don't think I was the only one who wondered if the larger of the two filler tiles couldn't be replaced with a much more modern version of Crystal Well. As you wrote, IBM has shown that eDRAM cache can be implemented with both low latencies and high throughput. And, Arrow Lake's larger of the two current filler tile is conveniently located right next to the compute tile. Intel needs to get creative (again) and come back with a real alternative to AMD's new 3D cache design for the 9800 Ryzen. Plus, would be a good dress-rehearsal for the next Xeons.

Reply (2)

Chester Lam

Nov 7, 2024

I suspect power is still a problem. DRAM destructive reads and refreshes cost power, and CPU performance is too often power limited. You would need enough eDRAM to catch a lot of L3 misses, and that's harder today with 32 or 36 MB L3 caches.

128 MB is 16x larger than the standard desktop 8 MB L3 caches of 2015. For the same increase over a 32 MB L3 today, you'd need 512 MB of L4. Perhaps Intel decided larger core structures and more aggressive OoO execution was a more practical answer. After all latency, not bandwidth, is usually the biggest concern for client applications. OoO execution helps hide latency.

Dave

Nov 2, 2024

Regardless, a L4 setup (with eDRAM no less) is nowhere near as efficient as a extended L3 setup. It can be potentially oodles more dense but SRAM will always beat DRAM to a pulp

Especially when AMD's L3 Vcache involves being on the same ring bus as the rest of the cores,

Intel has always had L3 on its own uncore clock domain for a long time now

The only way for intel to get close is to attempt Vcache but Intel being Intel does things because they can, not because they should...

RBGf

Nov 1, 2024

"high capacity caching is a fun part of Intel’s history. It would be fun to see it return."

IIRC MLID or someone else mentioned that they tried it for Meteor/Arrow lake, but the Adamantine base tile was cancelled in development due to #problems

Chips and Cheese

Broadwell’s eDRAM: VCache before VCache was…