From: Evan Green > Sent: 14 September 2023 17:37 > > On Thu, Sep 14, 2023 at 8:55 AM David Laight <David.Laight@xxxxxxxxxx> wrote: > > > > From: Evan Green > > > Sent: 14 September 2023 16:01 > > > > > > On Thu, Sep 14, 2023 at 1:47 AM David Laight <David.Laight@xxxxxxxxxx> wrote: > > > > > > > > From: Geert Uytterhoeven > > > > > Sent: 14 September 2023 08:33 > > > > ... > > > > > > > rzfive: > > > > > > > cpu0: Ratio of byte access time to unaligned word access is > > > > > > > 1.05, unaligned accesses are fast > > > > > > > > > > > > Hrm, I'm a little surprised to be seeing this number come out so close > > > > > > to 1. If you reboot a few times, what kind of variance do you get on > > > > > > this? > > > > > > > > > > Rock-solid at 1.05 (even with increased resolution: 1.05853 on 3 tries) > > > > > > > > Would that match zero overhead unless the access crosses a > > > > cache line boundary? > > > > (I can't remember whether the test is using increasing addresses.) > > > > > > Yes, the test does use increasing addresses, it copies across 4 pages. > > > We start with a warmup, so caching effects beyond L1 are largely not > > > taken into account. > > > > That seems entirely excessive. > > If you want to avoid data cache issues (which probably do) > > then just repeating a single access would almost certainly > > suffice. > > Repeatedly using a short buffer (say 256 bytes) won't add > > much loop overhead. > > Although you may want to do a test that avoids transfers > > that cross cache line and especially page boundaries. > > Either of those could easily be much slower than a read > > that is entirely within a cache line. > > We won't be faulting on any of these pages, and they should remain in > the TLB, so I don't expect many page boundary specific effects. If > there is a steep penalty for misaligned loads across a cache line, > such that it's worse than doing byte accesses, I want the test results > to be dinged for that. That is an entirely different issue. Are you absolutely certain that the reason 8 byte loads take as long as a 64-bit mis-aligned load isn't because the entire test is limited by L1 cache fills? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)