Le 03/09/2020 à 09:11, Christoph Hellwig a écrit :
On Wed, Sep 02, 2020 at 11:02:22AM -0700, Linus Torvalds wrote:
I don't see why this change would make any difference.
Me neither, but while looking at a different project I did spot places
that actually do an access_ok with len 0, that's why I wanted him to
try.
That being said: Christophe are these number stables? Do you get
similar numbers with multiple runs?
Yes the numbers are similar with multiple runs and multiple reboots.
And btw, why do the 32-bit and 64-bit checks even differ? It's not
like the extra (single) instruction should even matter. I think the
main reason is that the simpler 64-bit case could stay as a macro
(because it only uses "addr" and "size" once), but honestly, that
"simplification" doesn't help when you then need to have that #ifdef
for the 32-bit case and an inline function anyway.
I'll have to leave that to the powerpc folks. The intent was to not
change the behavior (and I even fucked that up for the the size == 0
case).
However, I suspect a bigger reason for the actual performance
degradation would be the patch that makes things use "write_iter()"
for writing, even when a simpler "write()" exists.
Except that we do not actually have such a patch. For normal user
writes we only use ->write_iter if ->write is not present. But what
shows up in the profile is that /dev/zero only has a read_iter op and
not a normal read. I've added a patch below that implements a normal
read which might help a tad with this workload, but should not be part
of a regression.
Also Christophe: can you bisect which patch starts this? Is it really
this last patch in the series?
5.9-rc2: 91.5MB/s
Patch 1: 74.9MB/s
Patch 2: 97.9MB/s
Patch 3: 97.7MB/s
Patch 4 to 9: 97.9MB/s
Patch 10: 85.3MB/s
Patch 11: 75.4MB/s
See my other mail, when removing CONFIG_STACKPROTECTOR, I get a stable
99.8MB/s throughput.
Christophe