22 Comments
User's avatar
Fredrik Tolf's avatar

It is outrageous that you need to use the Internet Archive to link to Anandtech articles. I can't believe they pulled down the articles. What a disgrace.

Expand full comment
Joel Hruska's avatar

I could not possibly agree more.

Expand full comment
Erik Stubblebine's avatar

In the last Cyberpunk benchmark Lion Cove (148 fps) vs. Skymont (141 fps), what was the difference in wattage or energy usage for those seven extra fps?

Expand full comment
Schrödinger's Cat's avatar

Chester: "That doesn’t mean all E-Core chips can go mainstream anytime soon, or that Intel’s E-Core line can displace their P-Cores."

Funny you should mention that, because a rumor has emerged that Intel will indeed retire P-cores, with the E-cores finally scaling up enough to take their place.

https://wccftech.com/intel-reportedly-plans-to-bring-a-unified-core-architecture-after-razer-lake-cpu/

Expand full comment
Schrödinger's Cat's avatar

I'm thrilled to see this! Actually, I wish you'd do a Lion Cove vs. Skymont efficiency comparison, along the lines of this article: https://chipsandcheese.com/p/alder-lakes-power-efficiency-a-complicated-picture

I do disagree with the conclusion of that article. The graphs showing each core type's power vs. performance give a misleading impression that they'd all be run at the same power level. Instead, it would make much more sense for the frequency governor to run them all at the same efficiency point, as allowed by the number of non-idle cores and the package power limit currently in effect.

If you plot the estimated perf/W on the basis of frequency scaling by iso-efficiency, Alder Lake does much better with its 8P + 8E configuration, than it would with a hypothetical 10P setup. In 7zip, 8P + 8E smashes even a 12P configuration at all power levels, and draws about even with it on x264.

Here are PNGs of those plots:

https://drive.proton.me/urls/Y3WPADF5K4#KV5nMjh6OVyE

https://drive.proton.me/urls/X267KVAAJ0#kMYEkm3liD0g

And yes, it was some trouble to extract the raw data from yours, but I'm up for a challenge.

Expand full comment
MaksDampf's avatar

Wow, especially the last graph in Cyberpunk is eye-opening.

This is without being clockspeed adjusted, right? From that a hypothetical N2P E-Core has the chance to lap Lion Cove if it achieves the 18% performance increase figure that TSMC quotes over N3E.

There might be an opening for a mainstream Desktop or mobile processor with 8 or 12 E-Cores to compete with AMDs 6 and 8Core mainstream APUs that don't clock as high either.

Since E-Cores roughly compete on a 1:3 Area basis to each other, a hypothetical 0P + 12E CPU would be roughly as small as the planned 4+0 Pantherlake Die but offer much better MT performance at just slightly reduced ST perf. Of course floating opoint workloads would suffer, but are client mobile workloads really that much FP heavy?

Also a 1+8 or 1+16 comes to mind just like the 1305U based on Alderlake technology with its 1+5 setup.

Expand full comment
Chester Lam's avatar

Yeah it's at stock. Keep in mind a next generation P-Core would also benefit from any process improvements.

And yeah, an all E-Core chip could probably do very well, especially if sold at low price points. Or, a 32x E-Core chip would be funny, though probably worse overall for consumers.

Expand full comment
Peter W.'s avatar

Did you have a chance to look at the power draw of the Skymont cores? If they are indeed significantly more efficient than a Lion Cove at comparable performance, a smaller all-Skymont CPU would also be interesting for lower end server duties.

Expand full comment
Fredrik Tolf's avatar

Given the difference in clock speed, I imagine the Skymont cores must be running at significantly higher IPC, right? I mean, given the shorter memory latency (in cycle count), that might not be terribly surprising, but given the difference in area, it also kind of is...

Makes me very interested in a SPEC comparison between Lion Cove and Skymont, though.

Expand full comment
MaksDampf's avatar

The B580 used makes it more likely that this was mainly GPU limited and the last graph cannot directly be divided by clockspeed and translated into higher IPC for the lower clocked Skymont.

That being said Cyberpunk likes core to core bandwidth and given that 4C share the same L2, that could be a contributing factor.

Still i think the most interesting part of a hypothetical all E-Core CPU would be cache latency. The large shared L2 already helps, but since a 4C cluster shares only one stop at the ringbus, the L3 and DRAM latency would be far lower than a comparable corecount P-Core CPU. Lunarlake L3 Latency is 52cycles according to C&C while Arrowlake measured 80+cycles. That would help in backend memory bound gaming workloads.

Expand full comment
Schrödinger's Cat's avatar

> I imagine the Skymont cores must be running at significantly higher IPC, right?

I'm not sure that's a very generalizable conclusion. As shown here, games are among the lowest-IPC workloads out there.

https://chipsandcheese.com/p/running-gaming-workloads-through

Refer to the graph in the section: Final Words

Expand full comment
Fredrik Tolf's avatar

I didn't mean high IPC in an absolute sense, it's just that since it barely loses to Lion Cove but runs at significantly lower clocks, then assuming it's the same workload, it basically has to execute the same instructions is significantly fewer cycles.

Expand full comment
Schrödinger's Cat's avatar

> This is without being clockspeed adjusted, right?

But, the clockspeed is somewhat intrinsic to the microarchitecture. Skymont isn't *designed* to clock as high as Lion Cove.

Expand full comment
Fredrik Tolf's avatar

>Of course floating opoint workloads would suffer, but are client mobile workloads really that much FP heavy?

Workloads that can make use of a hypothetical 12 cores would probably at least be more likely to also be vectorizable.

Expand full comment
c3dtops's avatar

Would you consider running those games solely on the Arrowlake i-gpu at lower resolution?

I think a homebased mini-PC with all E-Cores setup and a well-tuned i-gpu combination may appeal.

Ideally all E-Cores + some combination of the stacked caches (then it can address partially backend memory bound)

Expand full comment
Chester Lam's avatar

Meteor Lake would probably be more relevant for that than a desktop system, because my 155H's iGPU is more powerful than the one in desktop Arrow Lake.

Expand full comment
c3dtops's avatar

i had the wrong impression it was using arc graphics as well.

Looked up intel sites.

Arrowlake "K" series only has intel graphics and not intel arc graphics tile which comes with Meteor Lake 155H.

Expand full comment
Peter W.'s avatar

Agree. I have that same Meteor Lake in my laptop, and its Arc iGPU gives a good account of itself.

Expand full comment
Peter W.'s avatar

My much more pedestrian wish is/would have been Intel swapping the Alder Lake Gracemont 4E cluster in the N100 for the Raptor Lake 4E in their N150. That would have doubled the L2 Cache for the four Gracemonts (2 MB -> 4 MB). Instead Intel just increased the speed, which doesn't help efficiency.

That all being said, I agree that a mini-PC or NAS with eight or more Skymont cores could be an attractive offering.

Expand full comment
Schrödinger's Cat's avatar

The N100 has 6 MB of L3 cache. So, boosting L2 from 2 to 4 MB might not have been terribly impactful. I do wish Intel would give us 8x Crestmont cores on their Intel 3 node.

Expand full comment
Peter W.'s avatar

AFAIK, the L3 Cache of the N100 is also plagued by the high latency somewhat typical for Intel's CPUs, so less frequent use of L3 is still advantageous. The 4MB L2 of the 4E Gracemont Clusters in Raptor Lake have slightly more latency than the 2 MB L2 cache in the 4E Gracemont Cluster in Alder Lake (added a few nanoseconds), but according to some deep dives, that doubling of L2 more than made up for it.

Expand full comment
Schrödinger's Cat's avatar

> AFAIK, the L3 Cache of the N100 is also plagued by the high latency somewhat typical for Intel's CPUs

Would be interesting to see some data on this. I wonder why. Is it shared with the iGPU? Otherwise, it's only joining two E-core clusters, not distributed around a big ring bus.

Even if they did boost L2 to 4 MB, it's not going to yield a proper generational improvement and not even the magnitude of difference we saw between Apollo Lake and Gemini Lake. Neither of those had L3 cache and Goldmont+ included backend improvements.

The N-series was very good, for what it was. It deserves a proper successor, not a mere tuneup.

Edit: it looks like their plan for a successor is Wildcat Lake, but its P-cores make it decidedly less interesting to me.

Expand full comment