Re: [PATCH 04/18] csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Thu, 23 Jul 2020 16:21:01 +0100

On Thu, Jul 23, 2020 at 03:53:42PM +0100, Al Viro wrote:

> Said that, what you've printed for 1-byte segments (and that's going to be
> seriously affected by the setup costs in csum-copy.S, sensitive to calling
> convention changes) is time to run the 16-iteration loop divided by 1 * 16 / 8;
> IOW, your difference for 16 iterations here is 37*2 = 74 cycles.  With
> per-iteration diff being a bit under 5 cycles.  Which is not implausible,
> but
> 	1) extrapolating to other compiler versions, flags, etc. is not obvious
> 	2) the effects of calling convention changes need to be taken into account
> 	3) for copying to/from userland the effects of calling convention changes
> are be even larger, and kernel is certainly not going to issue kvec iters of _that_
> sort, TYVM.

To clarify it a bit: the effects of calling conventions change are mostly due
to not passing (and saving) those error pointers, and that could be had with
"pass the initial sum in" - just start these iov_iter.c loops with sum = ~0U
and we get the same warranties re not getting 0 in absence of faults.

The point is, your "~4.5 cycles per vector" is pretty much noise and the
difference between the 3-argument and 4-argument variants could easily be
in the same range.  It might be a valid microoptimization, it might be not.
3-argument variant is simpler and IMO in absence of strong data we ought
to go with that.