Discussion about this post

User's avatar
Schrödinger's Cat's avatar

Also, I'd be curious to know more about how modern x86 CPUs implement nontemporal stores. Back when I first played with it on a Pentium 4, it seemed to me that it had the effect of restricting which L2 cache set could be used. So, it did cause some cache pollution, but it was limited in scope. Do modern x86 CPUs still do something similar, or do they truly bypass the cache hierarchy, entirely?

Schrödinger's Cat's avatar

Don't compilers default to "natural alignment", though? That means a 64-bit data type should have 64-bit alignment, for instance. In that case, you'd pretty much have to go out of your way to create one of these scenarios.

BTW, I recently ran across the LOCK prefix, while attempting to validate Robert Hallock's claim that iBOT could lock cachelines (i.e. protect them from eviction). If Intel CPUs have any way to lock a cacheline, I believe it's not documented.

15 more comments...

No posts

Ready for more?