Re: [RFC][CFT][PATCHSET v1] uaccess unification

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 30 Mar 2017 11:56:33 -0700

On Thu, Mar 30, 2017 at 11:48 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> This is not going into the tree - it's just a "let's check your
> theory about might_fault() overhead being the source of slowdown
> you are seeing" quick-and-dirty patch.

Note that for cached hdparm reads, I suspect a *much* bigger effects
than the fairly cheap might_fault() tests is just the random layout of
the data in the page cache.

Memory is just more expensive than CPU is.

The precise physical address that gets allocated for the page cache
entries ends up mattering, and is obviously fairly "sticky" within one
reboot (unless you have a huge working set and that flushes it, or you
use something like

    echo 3 > /proc/sys/vm/drop_caches

to flush filesystem caches manually).

The reason things like page allocation matter for performance testing
is simply that the CPU caches are physically indexed (the L1 might not
be, but outer levels definitely are), and so page allocation ends up
impacting caching unless you have very high associativity.

And even if your workload doesn't fit in your CPU caches (I'd hope
that the "cached" hdparm is still doing a fairly big area), you'll
still see memory performance depend on physical addresses.

Doing kernel performance testing without rebooting several times is
generally very hard.

                   Linus