Thanks Chester! I may have overlooked it, but would be interested to see how the gaming performance of Arrow Lake running only (misty) on the Skymont E-Cores is (apologies if you've already written about it and I missed it). I believe that one has to leave at least one P-core enabled in Arrow Lake, but it would still be interesting to see how good, bad or ugly games like CP2077 or CoD would run mostly on the Skymonts.
If I remember correctly, 13900KS runs its uncore at 5.0GHz, so going back down to 3.8GHz for ARL is a major performance regression. Would be interesting to know can it approach RPL speeds when overclocked, lowering L3 latency might help a lot.
If I remember correctly, 13900KS runs its uncore at 5.0GHz, so going back down to 3.8GHz for ARL is a major performance regression. Would be interesting to know can it approach RPL speeds when overclocked, lowering L3 latency might help a lot.
Great piece of investigative work, thanks for posting it up.
Questions:
Hardware -> DDR5-6000 28-36-36-96 (which brand of memory stick were you using?)
I'm working on setting up some RDMA networking stuff with smart NIC and trying to workout which memory-kits should i go with. Online reviews suggest more "Expensive" kits like CU-DIM to close the gaps with X3D.
CPU related
BIOS -> Intel 200S boost enabled?
Frontend Bandwidth -> "8 renamer-slots on Lion-Cove". So simple ALU/AGU instruction (4-8bytes bytecode length) it can handle the allocation of physical registers at ~8 micro-ops per cycle pass the instruction decoders?
Arbitration queue (ARB)
"The ARB runs at the CPU tile’s uncore clock, or 3.8 GHz"
**uncore" means the base-clock of the P-Cores? Can that be over-clocked?
G.Skill F5-6000J2836G16G, actually supplied by AMD, but I'm using it in the ARL system because it loads the EXPO profile just fine and it's the fastest memory I have around. I wouldn't pay too much attention to the memory specifics.
I updated the BIOS, but didn't change anything beyond enabling EXPO. I'll check later to see if the 200S boost thing is enabled
Allocation rate is just 8 micro-ops, not related to instruction length (unless you have a lot of long instructions, have a low op cache hitrate, and bottleneck the decoders from hitting L1i bandwidth limits)
The uncore clock is the clock of the ring bus, which connects the cores to L3. Its been decoupled from core clocks since Haswell. Can be overclocked (iirc it's a multiplier off a 100 mhz base clock just like with the cores), but I'm looking at the general picture rather than overclocking.
hey Chester - ran the following on just the E-core (core 12).
Specs:
Core Ultra 265K
OS - Fedora Linux 42
Kernel - Linux 6.15.5-200.fc42.x86_64
Bios Version - 19.01 latest "as of July 16 2025" (enabled boost 200s)
Asus Motherboard Z890
DDR kit - G.SKILL DDR5-6000 Mhz, 32GBx2, CL30-40-40-96)
(model F5-6000J3040G32Gx2-FX5)
https://github.com/ChipsandCheese/Microbenchmarks
Picked MemoryLatency since that was the biggest negative about Arrowlake.
taskset -c 12 ./MemoryLatency
Usage: [-test <c/asm/tlb/mlp>] [-maxsizemb <max test size in MB>] [-iter <base iterations, default 100000000]
Region,Latency (ns)
2,0.874067
4,0.871156
8,0.869487
12,0.871046
16,0.870000
24,0.869852
32,0.872878
48,4.024554
64,4.098391
96,4.144344
128,4.167482
192,4.202611
256,4.720000
384,5.223539
512,5.475110
600,5.582734
768,5.701232
1024,5.832217
1536,5.959842
2048,6.162089
3072,6.484454
4096,8.368000
5120,10.734433
6144,12.412544
8192,14.061186
10240,14.958428
12288,15.550732
16384,16.721663
24567,25.251892
32768,42.031368
65536,82.608002
98304,94.112236
131072,100.787682
262144,107.434982
393216,109.781296
524288,111.751778
1048576,116.255997
lmk if you sense anything about ARL that's worth uncovering/deep-diving into.
Happy to run tests on my setup.
Thanks Chester! I may have overlooked it, but would be interested to see how the gaming performance of Arrow Lake running only (misty) on the Skymont E-Cores is (apologies if you've already written about it and I missed it). I believe that one has to leave at least one P-core enabled in Arrow Lake, but it would still be interesting to see how good, bad or ugly games like CP2077 or CoD would run mostly on the Skymonts.
If I remember correctly, 13900KS runs its uncore at 5.0GHz, so going back down to 3.8GHz for ARL is a major performance regression. Would be interesting to know can it approach RPL speeds when overclocked, lowering L3 latency might help a lot.
If I remember correctly, 13900KS runs its uncore at 5.0GHz, so going back down to 3.8GHz for ARL is a major performance regression. Would be interesting to know can it approach RPL speeds when overclocked, lowering L3 latency might help a lot.
Great piece of investigative work, thanks for posting it up.
Questions:
Hardware -> DDR5-6000 28-36-36-96 (which brand of memory stick were you using?)
I'm working on setting up some RDMA networking stuff with smart NIC and trying to workout which memory-kits should i go with. Online reviews suggest more "Expensive" kits like CU-DIM to close the gaps with X3D.
CPU related
BIOS -> Intel 200S boost enabled?
Frontend Bandwidth -> "8 renamer-slots on Lion-Cove". So simple ALU/AGU instruction (4-8bytes bytecode length) it can handle the allocation of physical registers at ~8 micro-ops per cycle pass the instruction decoders?
Arbitration queue (ARB)
"The ARB runs at the CPU tile’s uncore clock, or 3.8 GHz"
**uncore" means the base-clock of the P-Cores? Can that be over-clocked?
G.Skill F5-6000J2836G16G, actually supplied by AMD, but I'm using it in the ARL system because it loads the EXPO profile just fine and it's the fastest memory I have around. I wouldn't pay too much attention to the memory specifics.
I updated the BIOS, but didn't change anything beyond enabling EXPO. I'll check later to see if the 200S boost thing is enabled
Allocation rate is just 8 micro-ops, not related to instruction length (unless you have a lot of long instructions, have a low op cache hitrate, and bottleneck the decoders from hitting L1i bandwidth limits)
The uncore clock is the clock of the ring bus, which connects the cores to L3. Its been decoupled from core clocks since Haswell. Can be overclocked (iirc it's a multiplier off a 100 mhz base clock just like with the cores), but I'm looking at the general picture rather than overclocking.
There is no memory that will magically close a gap that sometimes exceeds 50%