Discussion about this post

User's avatar
c3dtops's avatar

Jeez.. great piece of insightful work.

I guess a lot of time must have went in to do those tests.

Do have a question for Chester:

P-Cores and E-Core use IDI to talk to the uncore, which starts with the ring bus on most Intel client designs.

Here the "uncore" refers to shared L3 cache slices onwards, the ring like interconnect on Arrow lake, I/O and integrated memory controllers?

Schrödinger's Cat's avatar

Also, I'd be curious to know more about how modern x86 CPUs implement nontemporal stores. Back when I first played with it on a Pentium 4, it seemed to me that it had the effect of restricting which L2 cache set could be used. So, it did cause some cache pollution, but it was limited in scope. Do modern x86 CPUs still do something similar, or do they truly bypass the cache hierarchy, entirely?

26 more comments...

No posts

Ready for more?