Hi Evan, On Fri, Aug 18, 2023 at 9:44 PM Evan Green <evan@xxxxxxxxxxxx> wrote: > Rather than deferring unaligned access speed determinations to a vendor > function, let's probe them and find out how fast they are. If we > determine that an unaligned word access is faster than N byte accesses, > mark the hardware's unaligned access as "fast". Otherwise, we mark > accesses as slow. > > The algorithm itself runs for a fixed amount of jiffies. Within each > iteration it attempts to time a single loop, and then keeps only the best > (fastest) loop it saw. This algorithm was found to have lower variance from > run to run than my first attempt, which counted the total number of > iterations that could be done in that fixed amount of jiffies. By taking > only the best iteration in the loop, assuming at least one loop wasn't > perturbed by an interrupt, we eliminate the effects of interrupts and > other "warm up" factors like branch prediction. The only downside is it > depends on having an rdtime granular and accurate enough to measure a > single copy. If we ever manage to complete a loop in 0 rdtime ticks, we > leave the unaligned setting at UNKNOWN. > > There is a slight change in user-visible behavior here. Previously, all > boards except the THead C906 reported misaligned access speed of > UNKNOWN. C906 reported FAST. With this change, since we're now measuring > misaligned access speed on each hart, all RISC-V systems will have this > key set as either FAST or SLOW. > > Currently, we don't have a way to confidently measure the difference between > SLOW and EMULATED, so we label anything not fast as SLOW. This will > mislabel some systems that are actually EMULATED as SLOW. When we get > support for delegating misaligned access traps to the kernel (as opposed > to the firmware quietly handling it), we can explicitly test in Linux to > see if unaligned accesses trap. Those systems will start to report > EMULATED, though older (today's) systems without that new SBI mechanism > will continue to report SLOW. > > I've updated the documentation for those hwprobe values to reflect > this, specifically: SLOW may or may not be emulated by software, and FAST > represents means being faster than equivalent byte accesses. The change > in documentation is accurate with respect to both the former and current > behavior. > > Signed-off-by: Evan Green <evan@xxxxxxxxxxxx> > Acked-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx> Thanks for your patch, which is now commit 584ea6564bcaead2 ("RISC-V: Probe for unaligned access speed") in v6.6-rc1. On the boards I have, I get: rzfive: cpu0: Ratio of byte access time to unaligned word access is 1.05, unaligned accesses are fast icicle: cpu1: Ratio of byte access time to unaligned word access is 0.00, unaligned accesses are slow cpu2: Ratio of byte access time to unaligned word access is 0.00, unaligned accesses are slow cpu3: Ratio of byte access time to unaligned word access is 0.00, unaligned accesses are slow cpu0: Ratio of byte access time to unaligned word access is 0.00, unaligned accesses are slow k210: cpu1: Ratio of byte access time to unaligned word access is 0.02, unaligned accesses are slow cpu0: Ratio of byte access time to unaligned word access is 0.02, unaligned accesses are slow starlight: cpu1: Ratio of byte access time to unaligned word access is 0.01, unaligned accesses are slow cpu0: Ratio of byte access time to unaligned word access is 0.02, unaligned accesses are slow vexriscv/orangecrab: cpu0: Ratio of byte access time to unaligned word access is 0.00, unaligned accesses are slow I am a bit surprised by the near-zero values. Are these expected? Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds