Thanks George! One thing I haven't heard or read any statement from Intel about is the memory controller and improvements in latency accessing main memory. Intel's CPUs have an unfortunate tendency to have significantly worse (longer) latencies accessing DRAM than, for example, AMD's Zen CPUs.
I forgot: the same (long-ish latencies, especially compared to Zen) also applies to L3 (last level) cache with Intel CPUs. So, if you or Chester have the opportunity to circle back to Intel about Panther Lake, please ask them about any improvements on speeds and latencies for both DRAM and L3 compared to both Lunar Lake and Arrow Lake - Thanks!
4 P-Cores and 8 E-Cores which share a 18MB L3 cache with an extra 4 LP E-Cores, all of which can access the 8MB memory-side cache on board the Compute tile.
So both the 4-P-Cores, 8-E cores and 4-LPE can access the 8MB Memory Side cache ("MSC"), darkmont LPE 4 cores (which forms 1 cluster) is already expected to use 4MB of the Memory side cache (upper bound scenario) that would theoretically mean the other 4MB is left for either some of the P-Core and E-core to fight over?
Question 2::
SRAM related question, in Meteor Lake (using 155H core ultra as a discussion point) that has 24MB of L3 cache
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 49152 557056 12 Data 1 64 1 64
L1i 65536 917504 16 Instruction 1 64 1 64
L2 2097152 18874368 16 Unified 2 2048 1 64
L3 25165824 25165824 12 Unified 3 32768 1 64
155H 24MB L3 vs Panther Lake (4+8+4 SKU) which has 18MB L3 + 8 MB memory side cache.
26MB / 24MB that impiles SRAM (which is harder to scale at the foundry/process node level) increased by ~8.3% ish from Intel 4 to Intel 18A.
What is the traditional SRAM density improvements for TSMC process node jump, historically?
Is that how you would read it? (question for Geroge but Chester feel free to answer)
Question 3::
What would be your estimates for L1D$, L1I$ and L2$ size for the LPE only (Per-Core)?
Thanks George! One thing I haven't heard or read any statement from Intel about is the memory controller and improvements in latency accessing main memory. Intel's CPUs have an unfortunate tendency to have significantly worse (longer) latencies accessing DRAM than, for example, AMD's Zen CPUs.
I forgot: the same (long-ish latencies, especially compared to Zen) also applies to L3 (last level) cache with Intel CPUs. So, if you or Chester have the opportunity to circle back to Intel about Panther Lake, please ask them about any improvements on speeds and latencies for both DRAM and L3 compared to both Lunar Lake and Arrow Lake - Thanks!
Panther Lake moves IMC back to compute tile, so it might cut DRAM latencies compared to Arrow Lake, where it is handled by SoC tile.
So there is hope for proper performance even if supported speeds remain the same.
Still i believe Intel can't quickly turn around decade long decay in memory subsystem and their first proper move will be Nova Lake.
But I'm looking forward to my favorite source of tech C&C to get their paws on Panther Lake and put IMC to proper testing from both CPU and iGPU!
Intel uses a ring-style interconnect setup for client, whilst on Xeon Server they tend to favor mesh-style. (at least that's my understanding)
Using ring and 18MB shared between (4P+8E) that would imply a slightly shorter "ring-hop"
1- shorter latency overall vs Arrow Lake
2- longer latency vs lunar (less ring hop since 1 less E Core Cluster on lunar lake)
Question 1::
4 P-Cores and 8 E-Cores which share a 18MB L3 cache with an extra 4 LP E-Cores, all of which can access the 8MB memory-side cache on board the Compute tile.
So both the 4-P-Cores, 8-E cores and 4-LPE can access the 8MB Memory Side cache ("MSC"), darkmont LPE 4 cores (which forms 1 cluster) is already expected to use 4MB of the Memory side cache (upper bound scenario) that would theoretically mean the other 4MB is left for either some of the P-Core and E-core to fight over?
Question 2::
SRAM related question, in Meteor Lake (using 155H core ultra as a discussion point) that has 24MB of L3 cache
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 49152 557056 12 Data 1 64 1 64
L1i 65536 917504 16 Instruction 1 64 1 64
L2 2097152 18874368 16 Unified 2 2048 1 64
L3 25165824 25165824 12 Unified 3 32768 1 64
155H 24MB L3 vs Panther Lake (4+8+4 SKU) which has 18MB L3 + 8 MB memory side cache.
26MB / 24MB that impiles SRAM (which is harder to scale at the foundry/process node level) increased by ~8.3% ish from Intel 4 to Intel 18A.
What is the traditional SRAM density improvements for TSMC process node jump, historically?
Is that how you would read it? (question for Geroge but Chester feel free to answer)
Question 3::
What would be your estimates for L1D$, L1I$ and L2$ size for the LPE only (Per-Core)?