Chips and Cheese

Agree! Especially for a "halo" product like this. At least some more details about the architecture and what considerations went into it would be appreciated!

Expand full comment

Zee

So he said that single CCD can pull 256 GB/s bandwidth instead 64 GB/s "normal zen5" can pull, and has lower latency than desktop zen5, can you please verify this when you do your Strix Halo tests,

also I hope we get RDNA4 and 5090 tests ( compared to RDNA3 and 4090) soon, most other sites do tests from gaming standpoint, you are only one doing serious GPU compute tests a la AnandTech R.I.P.

Expand full comment

Fredrik Tolf

I don't think you're going to see lower latency in practice because of the memory being used. The latency between the CCDs and the SoC die can very well be significantly lower and you're still going to see higher memory latency just because LPDDR is generally pretty horrible in latency.

Expand full comment

Fredrik Tolf

Jan 14Edited

I don't understand at all how the shared memory system works with AMD iGPUs. They say that the GPU can maximally access 96 GB, but why is there such a limit at all? The GPU uses virtual memory, doesn't it? Why can't you map GPU pages to any physical address? Why is there a need to "pre-allocate" memory to the GPU at all, especially on the BIOS level? Why wouldn't you just allocate your GPU buffers the exact same way that you allocate any other memory, and just direct the GPU to the right physical addresses? Is it just some sort of driver shortcoming?

Also, I was really surprised that the CCDs aren't shared with desktop! I really wasn't expecting them to tape out a whole new CCD just for this line. Is there any expected reuse for any other product line?

Expand full comment

Reply (2)

Joshua Miller

Jan 18

Windows. Even the Xbox has separate pools of memory despite sharing most of its APU design with the PS5 which does share address space between the two.

Expand full comment

Fredrik Tolf

Jan 23

>Windows

A tempting explanation for sure, but is this something you know, or is it just a guess?

Expand full comment

Jan 15

Being able to assign most of 64 or 96 GB RAM for use by the GPU (and NPU) could be interesting for running LLMs; yes, of course a lot slower than a big GPU, but being able to stay in RAM would certainly help.

Expand full comment

While I really liked the interview, I would have liked to also hear your guest's thoughts and comments on the similarities and differences between Strix Halo and the large M SoCs (APUs) from Apple.

As a request for the hopefully upcoming review: please test also how much of a bottleneck memory bandwidth is for especially GPU performance for both Strix Halo and - if possible and feasible - one or two other X86 SoCs for mobile. Thanks!

Expand full comment

Dante Fr.

Apple's monolithic, with accelerators integrated into its design for specific tasks

Expand full comment

I know that😀; however, it would have been interesting to hear, for example, how AMD sees that approach (monolith), and why they believe theirs is better. The key challenge with using chiplets (tiles) is that it tends to be less power efficient than monoliths, and also introduce additional latencies. The key challenge with monoliths is that even with the best fabrication, yield becomes more and more of an issue the larger the transistor number gets. Some of that was touched on, but I would have liked to hear a bit more.

Expand full comment