First, thanks Chester! This is another interesting player I wasn't really aware of. Who are their current or projected customers?
And I agree, it's very interesting that Japan continues to have at least a handful of companies that keep developing very interesting processors and accelerators, the largest of which is AFAIK still Fujitsu with their upcoming Monaka CPU.
It's however also noteworthy that Rapidus's aimed-for 2 nm node wasn't mentioned. That foundry needs to fab a functional chip that can serve as proof-of-concept for their ability to fab chips.
Probably the Japanese government. So far PEZY seems to be targeting TSMC nodes, and not even the latest ones. It feels like they're trying for efficiency with a narrowly targeted and optimized architecture, rather than leaning on process nodes. Or maybe they don't want to pay for the newest nodes.
A 12-cycle 4KB L1D at 1.5GHz seems like a mistake given that 4-cycle 64KB L1D are standard at 3+GHz... Such a tiny cache would be at worst 2 cycles, relax to 4 cycles to save power and it's still a 0-cycle load latency due to the threads.
Keep in mind it's meant to be used much like a datacenter GPU, with a lot of thread level parallelism and less focus on per-thread performance compared to a CPU
First, thanks Chester! This is another interesting player I wasn't really aware of. Who are their current or projected customers?
And I agree, it's very interesting that Japan continues to have at least a handful of companies that keep developing very interesting processors and accelerators, the largest of which is AFAIK still Fujitsu with their upcoming Monaka CPU.
It's however also noteworthy that Rapidus's aimed-for 2 nm node wasn't mentioned. That foundry needs to fab a functional chip that can serve as proof-of-concept for their ability to fab chips.
Probably the Japanese government. So far PEZY seems to be targeting TSMC nodes, and not even the latest ones. It feels like they're trying for efficiency with a narrowly targeted and optimized architecture, rather than leaning on process nodes. Or maybe they don't want to pay for the newest nodes.
A 12-cycle 4KB L1D at 1.5GHz seems like a mistake given that 4-cycle 64KB L1D are standard at 3+GHz... Such a tiny cache would be at worst 2 cycles, relax to 4 cycles to save power and it's still a 0-cycle load latency due to the threads.
Keep in mind it's meant to be used much like a datacenter GPU, with a lot of thread level parallelism and less focus on per-thread performance compared to a CPU