11 Comments

I'll read it in a moment, since I love architectural deep dives. But question first:

Do you have Patreon or Discord at least? That was TL;DR, I decided to spare you rest and will put it in reply so that people are less assaulted by my off-topic ramblings.

Expand full comment

Yes, there's a Discord (https://discord.gg/fcGB7efq)

Expand full comment

Long Version:

I have an issue with my Framework 16 and it's related to iGPU. I found out that all chips (7840 and 7940) having this iGPU (and possibly also their newer iterations) have the same issues regarding encoding and decoding. Some of it was fixed, some of it's effects can be lessened by choosing "correct" BIOS settings and Performance mode, some applications have workarounds etc. (so it doesn't look like it's 100% AMD drivers fault). But given that it's now been unsolved for over a year I'm starting to suspect that there is some architectural issue, which "could" be fixed using microcode, but possibly with some penalty big enough that AMD doesn't consider it worth it?

IDK - and before someone calls me an NVIDIA or Intel fanboy, my last NVIDIA card was on AGP port and my last Intel CPU that I bought was Pentium II. And I never had issues with AMD cards or CPU's. Even those that were widely reported. But sadly all I can do now is talk on reddit and AMD forum and no-one there is capable of even suggesting how to benchmark (likely micro-benchmark) it and maybe try to find the root cause and hopefully some more permanent workaround.

For now I'm stuck recording screen with OBS, using CPU (OpenGL workaround has it's issues as well) - otherwise any video I record will at best freeze after approx. 2 minutes of recording.

I know there is an "upgrade to paid" option for this substack - and honestly if price isn't ridiculous, I will get on that train maybe even before new year and definitely after. But it would be great to have some communication, because then I would know what programs to run to maybe a bit narrow it down.

And yes, some people say that the problem is fixed.... however first of all it requires turning off many features and second it only decreases chances of that occurring. Sadly it's been 20 years since I've done some serious architecture analysis myself and I can't program (which was my Achilles heel since forever). If I could I would do full investigation myself. Even to only understand better challenges of designing APU's and such.

OK. Now back to reading.

Expand full comment

I have 7840HS in my xiaomi laptop and recording with AMD VCN hasn't been an issue sooo

Expand full comment

This is the first time that Atom actually captures my interest. From its lame first releases, crushed by Bobcat etc, to its mobile failures, and E-cores proving to be mostly a break compared to well executed SMT of "full fat" cores - this now is different. Let's hope Intel does not mess this up in its current state of disorganization

Expand full comment

How does this compare to Sierra Forest cores? Do they have the same wide path but more cache and AVX-whatever ?

Expand full comment

Sierra forest uses Crestmont cores, Crestmont is narrower but that doesn't mean that skymont necessarily has the same throughput per die area if sierra forest were to be made with skymont but for the most part they would have to shrink the process for sierra forest to fit skymont in and even then in a power limited dense core config skymont might struggle to really show much improvements

Expand full comment

Ah, I guess what I should have asked is Clearwater Forest. Or is the client E-core a wholly different beast from anything on Xeon?

Expand full comment

Not as far as I know, seems that crestmont is identical save for double the L2 cache (compared to MTL which has never left mobile)

Mobile RPL-H is ADL-H don't make that mistake that a lot do, those still have 2MB/L2 Gracemont cluster

So I don't imagine Clearwater forest would be dramatically different. L2 cache might stay the same (its horrendously inefficient to go beyond 4MB/cluster for a L2)

Expand full comment

Great analysis of "Skymont unleashed". What I am curious about: How does the performance of Skymont in Arrow Lake compare to that of the Gracemont E-cores in Raptor Lake? While Crestmont in Meteor Lake had some smaller improvements over Gracemont, the E-core clusters in Meteor Lake regressed vs. those in Raptor Lake in one important aspect: L2 cache per cluster went down from 4 MB in RL to 2 MB in ML. And that must have hurt performance.

Arrow Lake's big problems doesn't stop with the 9000 series Zen 5 from AMD. They are also a bit underwhelming when compared to their predecessor, RL. Which also makes me wonder, just how much faster the Skymonts in Arrow Lake are vs. the Gracemonts in Raptor Lake. Any thoughts? Thanks!

Expand full comment

Alder Lake has 2MB/cluster Gracemont E cores as well. Laptop RPL-H is ADL-H in disguise on top of that

But yes, arrow lake is being hurt by 9000 series which is already hurt by memory bandwidth limitations

Expand full comment