On Wed, Sep 13, 2023 at 5:36 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > Hi Evan, > > On Fri, Aug 18, 2023 at 9:44 PM Evan Green <evan@xxxxxxxxxxxx> wrote: > > Rather than deferring unaligned access speed determinations to a vendor > > function, let's probe them and find out how fast they are. If we > > determine that an unaligned word access is faster than N byte accesses, > > mark the hardware's unaligned access as "fast". Otherwise, we mark > > accesses as slow. > > > > The algorithm itself runs for a fixed amount of jiffies. Within each > > iteration it attempts to time a single loop, and then keeps only the best > > (fastest) loop it saw. This algorithm was found to have lower variance from > > run to run than my first attempt, which counted the total number of > > iterations that could be done in that fixed amount of jiffies. By taking > > only the best iteration in the loop, assuming at least one loop wasn't > > perturbed by an interrupt, we eliminate the effects of interrupts and > > other "warm up" factors like branch prediction. The only downside is it > > depends on having an rdtime granular and accurate enough to measure a > > single copy. If we ever manage to complete a loop in 0 rdtime ticks, we > > leave the unaligned setting at UNKNOWN. > > > > There is a slight change in user-visible behavior here. Previously, all > > boards except the THead C906 reported misaligned access speed of > > UNKNOWN. C906 reported FAST. With this change, since we're now measuring > > misaligned access speed on each hart, all RISC-V systems will have this > > key set as either FAST or SLOW. > > > > Currently, we don't have a way to confidently measure the difference between > > SLOW and EMULATED, so we label anything not fast as SLOW. This will > > mislabel some systems that are actually EMULATED as SLOW. When we get > > support for delegating misaligned access traps to the kernel (as opposed > > to the firmware quietly handling it), we can explicitly test in Linux to > > see if unaligned accesses trap. Those systems will start to report > > EMULATED, though older (today's) systems without that new SBI mechanism > > will continue to report SLOW. > > > > I've updated the documentation for those hwprobe values to reflect > > this, specifically: SLOW may or may not be emulated by software, and FAST > > represents means being faster than equivalent byte accesses. The change > > in documentation is accurate with respect to both the former and current > > behavior. > > > > Signed-off-by: Evan Green <evan@xxxxxxxxxxxx> > > Acked-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx> > > Thanks for your patch, which is now commit 584ea6564bcaead2 ("RISC-V: > Probe for unaligned access speed") in v6.6-rc1. > > On the boards I have, I get: > > rzfive: > cpu0: Ratio of byte access time to unaligned word access is > 1.05, unaligned accesses are fast Hrm, I'm a little surprised to be seeing this number come out so close to 1. If you reboot a few times, what kind of variance do you get on this? > > icicle: > > cpu1: Ratio of byte access time to unaligned word access is > 0.00, unaligned accesses are slow > cpu2: Ratio of byte access time to unaligned word access is > 0.00, unaligned accesses are slow > cpu3: Ratio of byte access time to unaligned word access is > 0.00, unaligned accesses are slow > > cpu0: Ratio of byte access time to unaligned word access is > 0.00, unaligned accesses are slow > > k210: > > cpu1: Ratio of byte access time to unaligned word access is > 0.02, unaligned accesses are slow > cpu0: Ratio of byte access time to unaligned word access is > 0.02, unaligned accesses are slow > > starlight: > > cpu1: Ratio of byte access time to unaligned word access is > 0.01, unaligned accesses are slow > cpu0: Ratio of byte access time to unaligned word access is > 0.02, unaligned accesses are slow > > vexriscv/orangecrab: > > cpu0: Ratio of byte access time to unaligned word access is > 0.00, unaligned accesses are slow > > I am a bit surprised by the near-zero values. Are these expected? > Thanks! This could be expected, if firmware is trapping the unaligned accesses and coming out >100x slower than a native access. If you're interested in getting a little more resolution, you could try to print a few more decimal places with something like (sorry gmail mangles the whitespace on this): diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 1cfbba65d11a..2c094037658a 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -632,11 +632,11 @@ void check_unaligned_access(int cpu) if (word_cycles < byte_cycles) speed = RISCV_HWPROBE_MISALIGNED_FAST; - ratio = div_u64((byte_cycles * 100), word_cycles); - pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n", + ratio = div_u64((byte_cycles * 100000), word_cycles); + pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%05d, unaligned accesses are %s\n", cpu, - ratio / 100, - ratio % 100, + ratio / 100000, + ratio % 100000, (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow"); per_cpu(misaligned_access_speed, cpu) = speed; If you did, I'd be interested to see the results. -Evan