I really appreciate your detailed analysis. I have a small question: in the Core-to-Core latency analysis, I noticed there are 256 Core IDs, which implies that hardware multithreading is enabled. However, why is the latency between "two threads of the same core" very similar to the latency between "two cores within the same CCD"? I would expect the former to be handled directly through the L1 caches, while the latter would require a transfer through the L3 cache. Why is the latency so consistent?
How about we get some more “traditional” benchmarking to see what the actual performance is like with the extra GMI links and to see what kind of frequency performance you can get depending on different levels of utilization ?
Zen5c is Zen5 with less cache. Period. They don’t run them at higher frequencies because it’s not as efficient and the smaller cache hurts more as clock speeds go up because you’ll chew through more data in any given time period.
It has same ISA, optional AVX-512. It is density optimized zen 5 with less L3. Meaning when you design chips there is this Power, Performance, Area chart. Zen 5c take less area, use less power and achieve a bit less performance. But it is same design as Zen 5, it's just that "classic" cores are optimized for Performance, use more Power and more Area.
c chips are no like E-cores, which are usually 4 times smaller than P-core (Zen 5c is 25% smaller than Zen 5, not 75%), lack hyperthreading (I know Arrow Lake doesn't have it either) and have DIFFERENT instructions set.
Can they perform somewhat similar roles in their respective companies sometimes? Yes. But most of the time they will have different roles.
What about avx-512 performance? (Yeah, also for LLM inference!)
Second this!
Could you benchmark llama.cpp/olllama?
I really appreciate your detailed analysis. I have a small question: in the Core-to-Core latency analysis, I noticed there are 256 Core IDs, which implies that hardware multithreading is enabled. However, why is the latency between "two threads of the same core" very similar to the latency between "two cores within the same CCD"? I would expect the former to be handled directly through the L1 caches, while the latter would require a transfer through the L3 cache. Why is the latency so consistent?
How about we get some more “traditional” benchmarking to see what the actual performance is like with the extra GMI links and to see what kind of frequency performance you can get depending on different levels of utilization ?
Is the Zen5c's maximum clock 3.8Ghz ?
If AMD can keep it above 4ghz, it will work better against intel's E-core strategy.
Zen5c is Zen5 with less cache. Period. They don’t run them at higher frequencies because it’s not as efficient and the smaller cache hurts more as clock speeds go up because you’ll chew through more data in any given time period.
100% sure it's not just zen5 with less L3.
It has same ISA, optional AVX-512. It is density optimized zen 5 with less L3. Meaning when you design chips there is this Power, Performance, Area chart. Zen 5c take less area, use less power and achieve a bit less performance. But it is same design as Zen 5, it's just that "classic" cores are optimized for Performance, use more Power and more Area.
c chips are no like E-cores, which are usually 4 times smaller than P-core (Zen 5c is 25% smaller than Zen 5, not 75%), lack hyperthreading (I know Arrow Lake doesn't have it either) and have DIFFERENT instructions set.
Can they perform somewhat similar roles in their respective companies sometimes? Yes. But most of the time they will have different roles.