RE: [PATCH 1/2] RISC-V: Probe for unaligned access speed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Evan Green
> Sent: 27 June 2023 20:12
> 
> On Mon, Jun 26, 2023 at 2:42 PM Jessica Clarke <jrtc27@xxxxxxxxxx> wrote:
> >
> > On 23 Jun 2023, at 23:20, Evan Green <evan@xxxxxxxxxxxx> wrote:
> > >
> > > Rather than deferring misaligned access speed determinations to a vendor
> > > function, let's probe them and find out how fast they are. If we
> > > determine that a misaligned word access is faster than N byte accesses,
> > > mark the hardware's misaligned access as "fast".
> >
> > How sure are you that your measurements can be extrapolated and aren’t
> > an artefact of the testing process? For example, off the top of my head:
> >
> > * The first run will potentially be penalised by data cache misses,
> >   untrained prefetchers, TLB misses, branch predictors, etc. compared
> >   with later runs. You have one warmup, but who knows how many
> >   iterations it will take to converge?
> 
> I'd expect the cache penalties to be reasonably covered by a single
> warmup. You're right about branch prediction, which is why I tried to
> use a large-ish buffer size, minimize the ratio of conditionals to
> loads/stores, and do the test for a decent number of iterations (on my
> THead, about 1800 and 400 for words and bytes).
> 
> When I ran the test a handful of times, I did see variation on the
> order of ~5%. But the comparison of the two numbers doesn't seem to be
> anywhere near that margin (THead C906 was ~4x faster doing misaligned
> word accesses, others with slow misaligned accesses also reporting
> numbers not anywhere close to each other).

Isn't the EMULATED case so much slower than anything else that
it is even pretty obvious from a single access?
(Possibly the 2nd access to avoid 'cold cache'.)

One of the things that can perturb measurements is hardware
interrupts. That can be mitigated by counting clocks for a few
(10 is plenty) iterations of a short request and taking the
fastest value.
For short hot-cache code sequences you can actually compare the
actual clock counts with theoretical minimum values.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux