Re: [PATCH v4 1/2] RISC-V: Probe for unaligned access speed

Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> · Thu, 14 Sep 2023 09:32:37 +0200

Hi Evan,

On Wed, Sep 13, 2023 at 7:46 PM Evan Green <evan@xxxxxxxxxxxx> wrote:
> On Wed, Sep 13, 2023 at 5:36 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > On Fri, Aug 18, 2023 at 9:44 PM Evan Green <evan@xxxxxxxxxxxx> wrote:
> > > Rather than deferring unaligned access speed determinations to a vendor
> > > function, let's probe them and find out how fast they are. If we
> > > determine that an unaligned word access is faster than N byte accesses,
> > > mark the hardware's unaligned access as "fast". Otherwise, we mark
> > > accesses as slow.
> > >
> > > The algorithm itself runs for a fixed amount of jiffies. Within each
> > > iteration it attempts to time a single loop, and then keeps only the best
> > > (fastest) loop it saw. This algorithm was found to have lower variance from
> > > run to run than my first attempt, which counted the total number of
> > > iterations that could be done in that fixed amount of jiffies. By taking
> > > only the best iteration in the loop, assuming at least one loop wasn't
> > > perturbed by an interrupt, we eliminate the effects of interrupts and
> > > other "warm up" factors like branch prediction. The only downside is it
> > > depends on having an rdtime granular and accurate enough to measure a
> > > single copy. If we ever manage to complete a loop in 0 rdtime ticks, we
> > > leave the unaligned setting at UNKNOWN.
> > >
> > > There is a slight change in user-visible behavior here. Previously, all
> > > boards except the THead C906 reported misaligned access speed of
> > > UNKNOWN. C906 reported FAST. With this change, since we're now measuring
> > > misaligned access speed on each hart, all RISC-V systems will have this
> > > key set as either FAST or SLOW.
> > >
> > > Currently, we don't have a way to confidently measure the difference between
> > > SLOW and EMULATED, so we label anything not fast as SLOW. This will
> > > mislabel some systems that are actually EMULATED as SLOW. When we get
> > > support for delegating misaligned access traps to the kernel (as opposed
> > > to the firmware quietly handling it), we can explicitly test in Linux to
> > > see if unaligned accesses trap. Those systems will start to report
> > > EMULATED, though older (today's) systems without that new SBI mechanism
> > > will continue to report SLOW.
> > >
> > > I've updated the documentation for those hwprobe values to reflect
> > > this, specifically: SLOW may or may not be emulated by software, and FAST
> > > represents means being faster than equivalent byte accesses. The change
> > > in documentation is accurate with respect to both the former and current
> > > behavior.
> > >
> > > Signed-off-by: Evan Green <evan@xxxxxxxxxxxx>
> > > Acked-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> >
> > Thanks for your patch, which is now commit 584ea6564bcaead2 ("RISC-V:
> > Probe for unaligned access speed") in v6.6-rc1.
> >
> > On the boards I have, I get:
> >
> >     rzfive:
> >         cpu0: Ratio of byte access time to unaligned word access is
> > 1.05, unaligned accesses are fast
>
> Hrm, I'm a little surprised to be seeing this number come out so close
> to 1. If you reboot a few times, what kind of variance do you get on
> this?

Rock-solid at 1.05 (even with increased resolution: 1.05853 on 3 tries)

> >     icicle:
> >
> >         cpu1: Ratio of byte access time to unaligned word access is
> > 0.00, unaligned accesses are slow
> >         cpu2: Ratio of byte access time to unaligned word access is
> > 0.00, unaligned accesses are slow
> >         cpu3: Ratio of byte access time to unaligned word access is
> > 0.00, unaligned accesses are slow
> >
> >         cpu0: Ratio of byte access time to unaligned word access is
> > 0.00, unaligned accesses are slow

cpu1: Ratio of byte access time to unaligned word access is 0.00344,
unaligned accesses are slow
cpu2: Ratio of byte access time to unaligned word access is 0.00343,
unaligned accesses are slow
cpu3: Ratio of byte access time to unaligned word access is 0.00343,
unaligned accesses are slow
cpu0: Ratio of byte access time to unaligned word access is 0.00340,
unaligned accesses are slow

> >     k210:
> >
> >         cpu1: Ratio of byte access time to unaligned word access is
> > 0.02, unaligned accesses are slow
> >         cpu0: Ratio of byte access time to unaligned word access is
> > 0.02, unaligned accesses are slow

cpu1: Ratio of byte access time to unaligned word access is 0.02392,
unaligned accesses are slow
cpu0: Ratio of byte access time to unaligned word access is 0.02084,
unaligned accesses are slow

> >     starlight:
> >
> >         cpu1: Ratio of byte access time to unaligned word access is
> > 0.01, unaligned accesses are slow
> >         cpu0: Ratio of byte access time to unaligned word access is
> > 0.02, unaligned accesses are slow

cpu1: Ratio of byte access time to unaligned word access is 0.01872,
unaligned accesses are slow
cpu0: Ratio of byte access time to unaligned word access is 0.01930,
unaligned accesses are slow

> >     vexriscv/orangecrab:
> >
> >         cpu0: Ratio of byte access time to unaligned word access is
> > 0.00, unaligned accesses are slow

cpu0: Ratio of byte access time to unaligned word access is 0.00417,
unaligned accesses are slow

> > I am a bit surprised by the near-zero values.  Are these expected?
>
> This could be expected, if firmware is trapping the unaligned accesses
> and coming out >100x slower than a native access. If you're interested
> in getting a little more resolution, you could try to print a few more
> decimal places with something like (sorry gmail mangles the whitespace
> on this):

Looks like you need to add one digit to get anything useful on half of the
systems.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds