> P550 executes more instructions to finish the same work, and I’m not sure why.
If you're using sha256sum without compiling it yourself, chances are the RISC-V binaries are compiled without bitmanip instructions, so it's likely having to use less efficient sequences for stuff like bit rotates.
Which I think is a major issue with RISC-V in general - all the fragmentation means that Linux distributions target the lowest common denominator. RISC-V's attempts with profiles is nice, but distros still compile for x86-64v1 despite v2 being standard on x86 CPUs for the past ~15 years.
First of all, any ISA that is going to last for 25 or 50 or 100 [1] years is going to have to deal with change, adding new extensions for sure, and quite possibly over time standard software requiring a newer baseline.
But secondly, by the time mobile / desktop / server competitive RISC-V hardware arrives in 2027 or 2028, and potentially starts shipping in the millions, it's all going to be RVA23 with bitmanip and vector and vector crypto and hypervisor and so forth. There might be 10,000 or 20,000 or so legacy RV64GC SBCs in the hands of early adopters but they are going to be SO SLOW compared to the likes of the P870 (SiFive's first RVA23 core) and things from Ventana and Rivos and who knows who else that all those early adopters will be eagerly upgrading because that's what early adopters do.
It's not at all like x86_64 where a Core2Duo or even Athlon64 is still somewhat usable if you can stuff enough RAM and a decent video card in it AND they were sold in the hundreds of millions (maybe "just" 100m for the Athlon64) and are in use by people's grandparents etc.
[1] and why not? x86 is over 45 years and it was designed as a quick and dirty stopgap until iAPX432 arrived. Arm is 40 years and S/360 is already 60 years old. There is no reason that community-owned technology-neutral things such as Linux and RISC-V can't be used for centuries.
You honestly believe there will competitive consumer products on the market shipping in the millions by 2027 or 2028? I could see it possibly catching on in small amounts in server from people that need the compute but can’t afford x86. Look how weak Qualcomm’s performance has been in the consumer laptop market with one of the largest marketing pushes in PC history. What are the chances there will be RISC V consumer parts in the market with a performance advantage over ARM?
There will be consumer products in that time frame, yes.
"Millions" is really not hard to do when there are over a billion smartphones shipped a year.
You don't have to have a performance advantage over flagship Arm phones. There are still large numbers of phones and tablets selling today -- probably the majority of the market by volume -- with old tech such as the A53.
And RISC V can’t financially compete with the a53 while companies trying to amortize new designs just to get to a53 performance. These out of order cores are like only 5-10% faster than an in order a53 anyways. These new designs might not even be compatible with ancient dirt cheap nodes like the a53 either. Right now, RISC V org’s main priority should be getting software support to a level where it can even be a consumer general purpose CPU. Right now, the only situation where RISC V has any legit advantages over ARM is trying to minimize royalties on cheap microcontrollers. “Open source” means nothing if you can’t get the software you need.
SiFive has had A55-level performance -- better than A53 -- in the U74 since October 2018. That's been the workhorse core in the RISC-V ecosystem since mid 2021 in the HiFive Unmatched, VisionFive 1 and 2, Milk-V Mars, Pine64 Star64 and PineTab-V, DC-Roma laptop, and the upcoming FrameWork main board.
I'd say it's pretty well amortised by now.
Your arguments in the last couple of comments have basically come down to "nothing new can ever succeed in the face of established players", which is obviously false as everything was new at some point, including Aarch64 just 10 years earlier than RISC-V, if we take ARMv8.0-A to be equivalent to RVA22 -- or Amd64 10 years before Arm64.
It sounds like you're very optimistic with everything working out that way. I don't share anywhere close to your level of optimism, but I can't predict the future, so who knows - maybe it happens.
The probability of getting significant traction [1] this decade in things such as Android Phones and Chromebooks and servers is not 100%, of course, but in the eight years between when I bought my first RISC-V board in December 2016 and now I'd say it's gone from 5% to maybe 90%.
1%? What can I say except "Prepare to be astonished"?
Arm64 has gone from 0% of AWS when the first Graviton was introduced in November 2018 to something around 20% today.
A RISC-V server chip on a similar level to Graviton 1 has been shipping in volume for the last 12 months, just five years behind. That's the SG2042 with 64 cores (vs 16 for the first Graviton), C910 cores which are worse than A72 but not that much, supports 128 GB RAM, 32 lanes of PCIe. You could say it's worse than Graviton, but it's way ahead of the first ThunderX in 2016
The C910 has turned out to have a couple of problems, but a revised C920v2 core and SG2044 chip with the problems fixed and RVV 1.0 instead of 0.7 is on the way. Not to mention lots of other chips from lots of other companies.
No one is saying RISC-V will quickly overtake Arm in market share, let alone x86, but not even getting 1% in this decade, let alone ever? Preposterous. We're way past that being a serious possibility.
ARM has been around for a long, long time and still encounters not insignificant compatibility difficulties compared to x86. I'm not optimistic about Risc V getting this kind of support either, unless a big corporation pushes it for some reason.
I thought the Chinese might push Risc V into the smartwatch or TV market, but neither happened.
For the T-Head C906 or C910, I recommend using -mcpu=thead-c906, which will enable all supported extensions (for example, xtheadba, xtheadbb, xtheadcondmov, etc). If -mcpu is set, then there is no need for -march and -mtune (both will be set to optimum values).
Enabling xtheadvector won't affect code generation because GCC does not support autovectorization for xtheadvector.
The in-order U74, with RTL available to their customers in October 2018, was already equivalent to A55, which had similar RTL availability 17 months earlier in May 2017.
The U84, released in October 2019, was SiFive's first OoO core. The P550 is mostly just a renaming of that with some incremental tweaks e.g. adding Zba, Zbb extensions (ratified December 2021).
SiFive has two major generations of OoO cores since then, with SPECINT2k6/GHz scores progressing something like U74: 4, P550: 8.5, P670: 12, P870: 18.
SoC design and release schedules and performance of the rest of the SoC is of course up to the customer, just as it is with Arm cores. Intel's "Horse Creek" with the same core was supposed to be shipping almost two years ago with higher MHz and probably better IP for DDR etc but ... Intel problems.
1) It's a great pity an in-order RISC-V machine similar to A55 was not tested -- the obvious choice being SiFive's own U74 core which in the JH7110 is probably the most common RISC-V SBC and laptop SoC of the last two years, in the VisionFive 2, the Pine64 Star64, the Milk-V Mars, the DC-Roma laptop and so on.
As they say, on many tasks a well-implemented in-order CPU can be very competitive with a small OoO.
I have both VisionFive 2 (1.5 GHz) and the C910 Lichee Pi 4A (1.85 GHz). While the C910 wins most micro-benchmarks (memcpy, primes, Dhrystone and Coremark etc) on the real-world things I use a computer for the VisionFive 2 is *always* faster. Even something simple like launching emacs. It's significantly faster on building a Linux Kernel (67m35s vs 88m4s), or compiling GNU binutils & GCC or running the CoreCLR unit test suite
2) the P550 is not limited to the 1.4 GHz its being run at here. SiFive is clearly being very conservative. Eswin say the chip runs at 1.8 GHz and so do Milk-V with their "Megrez" SBC which (with 16 GB RAM) is half the price of the HiFive Premier P550 at $199 vs $399. I have one currently 4 days into transit from Arace in China to New Zealand. Perhaps I'll have it next week. A number of other people reported on Reddit that theirs have also shipped.
The Eswin SoC is a 2nd choice fallback plan after Intel apparently shut down their "Horse Creek" project using the P550. Intel said Horse Creek would be "2+ GHz" and they demonstrated a test chip running at 2.2 GHz at Intel Innovation 2022 Developer Conference in October 2022, almost 2 1/2 years ago. That had been expected to ship in summer 2023.
Any idea how the P550 or C910 compare to the A53 or A55 area- or power-wise? Since they seem roughly comparable in performance, it would be interesting to know how they compare in such metrics.
Thanks for this follow up on RISC-V! I wonder if some of the losses vs. the in-order A55 is due to the inability of especially the P550 to deal with unaligned access. You mentioned in your preceding article that unaligned access "dependent or not, confuses P550 for hundreds of cycles." That's really bad for performance, and might well nullify the advantage of being an out-of-order design vs A55s in-order. Was the T-Head similarly affected by this?
As do SpacemiT. SiFive's U74 and now P550 are the only common RISC-V Linux-capable cores that don't do unaligned access in hardware. They don't crash, they just run a bit slowly.
The market will decide whether to buy machines with slow misaligned access or not.
I checked a few months ago and explicit code to do arbitrary unaligned loads takes 6 cycles on a U74 or SpacemiT X60, fewer on a wider core. That drops a if you statically know the offset from alignment e.g. an unaligned field access from an aligned pointer where you only need 5 instructions with 3 instruction latency. It's even less (4 instructions) if you're accessing a sequence of contiguous data items that are all misaligned by the same amount
One good thing about C910 is it's a partial open-source design: https://github.com/XUANTIE-RV/openc910, so we can know what's exactly happening in uarch.
Sort of. Part of the difficulty is some parts of the source code do not seem to match both official documentation and microbenchmarking results. Perhaps they open sourced a different iteration of the code than what made it into actual designs
> P550 executes more instructions to finish the same work, and I’m not sure why.
If you're using sha256sum without compiling it yourself, chances are the RISC-V binaries are compiled without bitmanip instructions, so it's likely having to use less efficient sequences for stuff like bit rotates.
Which I think is a major issue with RISC-V in general - all the fragmentation means that Linux distributions target the lowest common denominator. RISC-V's attempts with profiles is nice, but distros still compile for x86-64v1 despite v2 being standard on x86 CPUs for the past ~15 years.
I don't think it's a big problem.
First of all, any ISA that is going to last for 25 or 50 or 100 [1] years is going to have to deal with change, adding new extensions for sure, and quite possibly over time standard software requiring a newer baseline.
But secondly, by the time mobile / desktop / server competitive RISC-V hardware arrives in 2027 or 2028, and potentially starts shipping in the millions, it's all going to be RVA23 with bitmanip and vector and vector crypto and hypervisor and so forth. There might be 10,000 or 20,000 or so legacy RV64GC SBCs in the hands of early adopters but they are going to be SO SLOW compared to the likes of the P870 (SiFive's first RVA23 core) and things from Ventana and Rivos and who knows who else that all those early adopters will be eagerly upgrading because that's what early adopters do.
It's not at all like x86_64 where a Core2Duo or even Athlon64 is still somewhat usable if you can stuff enough RAM and a decent video card in it AND they were sold in the hundreds of millions (maybe "just" 100m for the Athlon64) and are in use by people's grandparents etc.
[1] and why not? x86 is over 45 years and it was designed as a quick and dirty stopgap until iAPX432 arrived. Arm is 40 years and S/360 is already 60 years old. There is no reason that community-owned technology-neutral things such as Linux and RISC-V can't be used for centuries.
You honestly believe there will competitive consumer products on the market shipping in the millions by 2027 or 2028? I could see it possibly catching on in small amounts in server from people that need the compute but can’t afford x86. Look how weak Qualcomm’s performance has been in the consumer laptop market with one of the largest marketing pushes in PC history. What are the chances there will be RISC V consumer parts in the market with a performance advantage over ARM?
There will be consumer products in that time frame, yes.
"Millions" is really not hard to do when there are over a billion smartphones shipped a year.
You don't have to have a performance advantage over flagship Arm phones. There are still large numbers of phones and tablets selling today -- probably the majority of the market by volume -- with old tech such as the A53.
And RISC V can’t financially compete with the a53 while companies trying to amortize new designs just to get to a53 performance. These out of order cores are like only 5-10% faster than an in order a53 anyways. These new designs might not even be compatible with ancient dirt cheap nodes like the a53 either. Right now, RISC V org’s main priority should be getting software support to a level where it can even be a consumer general purpose CPU. Right now, the only situation where RISC V has any legit advantages over ARM is trying to minimize royalties on cheap microcontrollers. “Open source” means nothing if you can’t get the software you need.
SiFive has had A55-level performance -- better than A53 -- in the U74 since October 2018. That's been the workhorse core in the RISC-V ecosystem since mid 2021 in the HiFive Unmatched, VisionFive 1 and 2, Milk-V Mars, Pine64 Star64 and PineTab-V, DC-Roma laptop, and the upcoming FrameWork main board.
I'd say it's pretty well amortised by now.
Your arguments in the last couple of comments have basically come down to "nothing new can ever succeed in the face of established players", which is obviously false as everything was new at some point, including Aarch64 just 10 years earlier than RISC-V, if we take ARMv8.0-A to be equivalent to RVA22 -- or Amd64 10 years before Arm64.
It sounds like you're very optimistic with everything working out that way. I don't share anywhere close to your level of optimism, but I can't predict the future, so who knows - maybe it happens.
The probability of getting significant traction [1] this decade in things such as Android Phones and Chromebooks and servers is not 100%, of course, but in the eight years between when I bought my first RISC-V board in December 2016 and now I'd say it's gone from 5% to maybe 90%.
[1] double digit market share percentage?
I would be genuinely astonished if RISC-V manages to achieve even a 1% market share in any segment, whether mobile, servers, or PCs.
1%? What can I say except "Prepare to be astonished"?
Arm64 has gone from 0% of AWS when the first Graviton was introduced in November 2018 to something around 20% today.
A RISC-V server chip on a similar level to Graviton 1 has been shipping in volume for the last 12 months, just five years behind. That's the SG2042 with 64 cores (vs 16 for the first Graviton), C910 cores which are worse than A72 but not that much, supports 128 GB RAM, 32 lanes of PCIe. You could say it's worse than Graviton, but it's way ahead of the first ThunderX in 2016
The C910 has turned out to have a couple of problems, but a revised C920v2 core and SG2044 chip with the problems fixed and RVV 1.0 instead of 0.7 is on the way. Not to mention lots of other chips from lots of other companies.
No one is saying RISC-V will quickly overtake Arm in market share, let alone x86, but not even getting 1% in this decade, let alone ever? Preposterous. We're way past that being a serious possibility.
ARM has been around for a long, long time and still encounters not insignificant compatibility difficulties compared to x86. I'm not optimistic about Risc V getting this kind of support either, unless a big corporation pushes it for some reason.
I thought the Chinese might push Risc V into the smartwatch or TV market, but neither happened.
For the T-Head C906 or C910, I recommend using -mcpu=thead-c906, which will enable all supported extensions (for example, xtheadba, xtheadbb, xtheadcondmov, etc). If -mcpu is set, then there is no need for -march and -mtune (both will be set to optimum values).
Enabling xtheadvector won't affect code generation because GCC does not support autovectorization for xtheadvector.
Equivalent performance to an a55 from the latest greatest out of order architectures on RISC-V is a bit disappointing, to say the least.
The in-order U74, with RTL available to their customers in October 2018, was already equivalent to A55, which had similar RTL availability 17 months earlier in May 2017.
The U84, released in October 2019, was SiFive's first OoO core. The P550 is mostly just a renaming of that with some incremental tweaks e.g. adding Zba, Zbb extensions (ratified December 2021).
SiFive has two major generations of OoO cores since then, with SPECINT2k6/GHz scores progressing something like U74: 4, P550: 8.5, P670: 12, P870: 18.
SoC design and release schedules and performance of the rest of the SoC is of course up to the customer, just as it is with Arm cores. Intel's "Horse Creek" with the same core was supposed to be shipping almost two years ago with higher MHz and probably better IP for DDR etc but ... Intel problems.
Does SPEC CPU results refer to single thread performance or account the number of cores for each CPU?
I have two points here:
1) It's a great pity an in-order RISC-V machine similar to A55 was not tested -- the obvious choice being SiFive's own U74 core which in the JH7110 is probably the most common RISC-V SBC and laptop SoC of the last two years, in the VisionFive 2, the Pine64 Star64, the Milk-V Mars, the DC-Roma laptop and so on.
As they say, on many tasks a well-implemented in-order CPU can be very competitive with a small OoO.
I have both VisionFive 2 (1.5 GHz) and the C910 Lichee Pi 4A (1.85 GHz). While the C910 wins most micro-benchmarks (memcpy, primes, Dhrystone and Coremark etc) on the real-world things I use a computer for the VisionFive 2 is *always* faster. Even something simple like launching emacs. It's significantly faster on building a Linux Kernel (67m35s vs 88m4s), or compiling GNU binutils & GCC or running the CoreCLR unit test suite
2) the P550 is not limited to the 1.4 GHz its being run at here. SiFive is clearly being very conservative. Eswin say the chip runs at 1.8 GHz and so do Milk-V with their "Megrez" SBC which (with 16 GB RAM) is half the price of the HiFive Premier P550 at $199 vs $399. I have one currently 4 days into transit from Arace in China to New Zealand. Perhaps I'll have it next week. A number of other people reported on Reddit that theirs have also shipped.
The Eswin SoC is a 2nd choice fallback plan after Intel apparently shut down their "Horse Creek" project using the P550. Intel said Horse Creek would be "2+ GHz" and they demonstrated a test chip running at 2.2 GHz at Intel Innovation 2022 Developer Conference in October 2022, almost 2 1/2 years ago. That had been expected to ship in summer 2023.
https://web.archive.org/web/20221101114447/https://fuse.wikichip.org/news/7277/intel-sifive-demo-high-performance-risc-v-horse-creek-dev-platform-on-intel-4-process/
Any idea how the P550 or C910 compare to the A53 or A55 area- or power-wise? Since they seem roughly comparable in performance, it would be interesting to know how they compare in such metrics.
Thanks for this follow up on RISC-V! I wonder if some of the losses vs. the in-order A55 is due to the inability of especially the P550 to deal with unaligned access. You mentioned in your preceding article that unaligned access "dependent or not, confuses P550 for hundreds of cycles." That's really bad for performance, and might well nullify the advantage of being an out-of-order design vs A55s in-order. Was the T-Head similarly affected by this?
T-HEAD is not affected. They have pretty good handling for unaligned accesses
As do SpacemiT. SiFive's U74 and now P550 are the only common RISC-V Linux-capable cores that don't do unaligned access in hardware. They don't crash, they just run a bit slowly.
The market will decide whether to buy machines with slow misaligned access or not.
I checked a few months ago and explicit code to do arbitrary unaligned loads takes 6 cycles on a U74 or SpacemiT X60, fewer on a wider core. That drops a if you statically know the offset from alignment e.g. an unaligned field access from an aligned pointer where you only need 5 instructions with 3 instruction latency. It's even less (4 instructions) if you're accessing a sequence of contiguous data items that are all misaligned by the same amount
https://www.reddit.com/r/RISCV/comments/1ezbyr4/comment/ljkbx95/
One good thing about C910 is it's a partial open-source design: https://github.com/XUANTIE-RV/openc910, so we can know what's exactly happening in uarch.
Sort of. Part of the difficulty is some parts of the source code do not seem to match both official documentation and microbenchmarking results. Perhaps they open sourced a different iteration of the code than what made it into actual designs