That was an awesome post. Something that really piqued my interest is the amount of external memory bandwidth AMD GPUs require. I wonder how would adreno and their tile based aquitecture would behave in such scenario. As in how much external bandwidth compares between the two. Keep up the good work 🙏
Thanks Chester, also nice to see that you're also using YT now. However, please continue to post your findings and thoughts here and on the website - Thanks!
Two comments, one question about this analysis of Strix Halo's Infinity Cache and Memory system:
1. Due to Strix Halo being an APU, going with, let's say GDDR6 or GDDR7 instead of LPDDR5 would probably have exposed the CPU cores to significantly longer latencies, something that GDDR is known for. In a console like the PS5, using GDDR as RAM also for the CPU isn't that much of a downside. In more general compute situations, latency becomes more of an issue. Which is probably why Apple also stuck with DRAM (LPDDR) for their "APUs", including the Max versions of their M SoCs with a large number of GPU cores.
2. Here my question: do you know or have an estimate for how great the cost in both die area/transistors and power draw is that the large memory controller in Strix Halo requires compared with the more standard dual channel design? Any information is appreciated!
Especially in the consumer space (laptops and desktops), the standard reply to any request for more than two channels and/or broader buses has been for decades "that would just increase costs because of the increased die area required, increase power draw even when idle, and wouldn't make a difference for office stuff or gaming anyway".
I hope that this mindset will change now that we have beefier iGPUs that really benefit from higher throughput. Besides, with AI now everywhere, higher data rates are getting more important even in consumer SoCs.
Idle power draw seems fine because the memory bus clocks down. From the MemClk event, the UMCs appear to idle at 200-300 MHz, and only go up to ~1 GHz under load. RAPL (however accurate that is) puts idle power at ~2W. So the cost would likely be more on the die area rather than idle power side. And yea it would only help for big iGPUs, not much for CPU workloads
Thanks for answering, Chester! This dispells the notion that an SoC having more than two memory channels automatically results in a higher power draw. I hope that we'll see more consumer-class SoCs/APUs that feature multi- channel (>2) memory controllers.
That was an awesome post. Something that really piqued my interest is the amount of external memory bandwidth AMD GPUs require. I wonder how would adreno and their tile based aquitecture would behave in such scenario. As in how much external bandwidth compares between the two. Keep up the good work 🙏
Great article!
Could someone benchmark the Apple M5’s cache setup? I find the reason behind its unbelievable single-core speed very interesting. Thanks!
Thanks Chester, also nice to see that you're also using YT now. However, please continue to post your findings and thoughts here and on the website - Thanks!
Two comments, one question about this analysis of Strix Halo's Infinity Cache and Memory system:
1. Due to Strix Halo being an APU, going with, let's say GDDR6 or GDDR7 instead of LPDDR5 would probably have exposed the CPU cores to significantly longer latencies, something that GDDR is known for. In a console like the PS5, using GDDR as RAM also for the CPU isn't that much of a downside. In more general compute situations, latency becomes more of an issue. Which is probably why Apple also stuck with DRAM (LPDDR) for their "APUs", including the Max versions of their M SoCs with a large number of GPU cores.
2. Here my question: do you know or have an estimate for how great the cost in both die area/transistors and power draw is that the large memory controller in Strix Halo requires compared with the more standard dual channel design? Any information is appreciated!
Especially in the consumer space (laptops and desktops), the standard reply to any request for more than two channels and/or broader buses has been for decades "that would just increase costs because of the increased die area required, increase power draw even when idle, and wouldn't make a difference for office stuff or gaming anyway".
I hope that this mindset will change now that we have beefier iGPUs that really benefit from higher throughput. Besides, with AI now everywhere, higher data rates are getting more important even in consumer SoCs.
I'm not sure what the power cost is, but people have taken die shots of Strix Halo (https://misdake.github.io/ChipAnnotationViewer/?map=StrixHalo). The memory controllers are on the left and right sides of the IO die.
Idle power draw seems fine because the memory bus clocks down. From the MemClk event, the UMCs appear to idle at 200-300 MHz, and only go up to ~1 GHz under load. RAPL (however accurate that is) puts idle power at ~2W. So the cost would likely be more on the die area rather than idle power side. And yea it would only help for big iGPUs, not much for CPU workloads
Thanks for answering, Chester! This dispells the notion that an SoC having more than two memory channels automatically results in a higher power draw. I hope that we'll see more consumer-class SoCs/APUs that feature multi- channel (>2) memory controllers.