13 Comments

AMD really needs a formal Whitepaper with diagrams and such that goes into detail about this IP! There has to be more Whitepaper content published by AMD going forward or it's not going to be easy to see the advantage on the work that AMD is doing!

Expand full comment

Agree! Especially for a "halo" product like this. At least some more details about the architecture and what considerations went into it would be appreciated!

Expand full comment

So he said that single CCD can pull 256 GB/s bandwidth instead 64 GB/s "normal zen5" can pull, and has lower latency than desktop zen5, can you please verify this when you do your Strix Halo tests,

also I hope we get RDNA4 and 5090 tests ( compared to RDNA3 and 4090) soon, most other sites do tests from gaming standpoint, you are only one doing serious GPU compute tests a la AnandTech R.I.P.

Expand full comment

I don't think you're going to see lower latency in practice because of the memory being used. The latency between the CCDs and the SoC die can very well be significantly lower and you're still going to see higher memory latency just because LPDDR is generally pretty horrible in latency.

Expand full comment

I don't understand at all how the shared memory system works with AMD iGPUs. They say that the GPU can maximally access 96 GB, but why is there such a limit at all? The GPU uses virtual memory, doesn't it? Why can't you map GPU pages to any physical address? Why is there a need to "pre-allocate" memory to the GPU at all, especially on the BIOS level? Why wouldn't you just allocate your GPU buffers the exact same way that you allocate any other memory, and just direct the GPU to the right physical addresses? Is it just some sort of driver shortcoming?

Also, I was really surprised that the CCDs aren't shared with desktop! I really wasn't expecting them to tape out a whole new CCD just for this line. Is there any expected reuse for any other product line?

Expand full comment

Windows. Even the Xbox has separate pools of memory despite sharing most of its APU design with the PS5 which does share address space between the two.

Expand full comment

>Windows

A tempting explanation for sure, but is this something you know, or is it just a guess?

Expand full comment

Being able to assign most of 64 or 96 GB RAM for use by the GPU (and NPU) could be interesting for running LLMs; yes, of course a lot slower than a big GPU, but being able to stay in RAM would certainly help.

Expand full comment

While I really liked the interview, I would have liked to also hear your guest's thoughts and comments on the similarities and differences between Strix Halo and the large M SoCs (APUs) from Apple.

As a request for the hopefully upcoming review: please test also how much of a bottleneck memory bandwidth is for especially GPU performance for both Strix Halo and - if possible and feasible - one or two other X86 SoCs for mobile. Thanks!

Expand full comment

Apple's monolithic, with accelerators integrated into its design for specific tasks

Expand full comment

I know that😀; however, it would have been interesting to hear, for example, how AMD sees that approach (monolith), and why they believe theirs is better. The key challenge with using chiplets (tiles) is that it tends to be less power efficient than monoliths, and also introduce additional latencies. The key challenge with monoliths is that even with the best fabrication, yield becomes more and more of an issue the larger the transistor number gets. Some of that was touched on, but I would have liked to hear a bit more.

Expand full comment

Apple can afford the poor yields that come from huge dies because they only pay for the good dies. There’s also much less transparency and very little objective analysis in the Apple centered reporting world, so they can get away with misses, like the M1 Ultra GPU not really acting as a unified part with each half not making use of the other’s memory bandwidth or the M2 being something of a dud despite significant increases in compute resources. AMD has to pay for full wafers, thus they get much better returns by making small CCDs and producing the IO die on a process more optimized for its components. When they mess up a design or implementation it’s all over the news.

It’s not a level playing field.

Expand full comment

Still no cheese reviews!

This is depressing! How

about some fish too?

Fish and Chips sounds

more appetizing than

chips and cheese! ;-)

Expand full comment