RE: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()

David Laight <David.Laight@xxxxxxxxxx> · Fri, 24 Apr 2020 09:41:30 +0000



From: Robin Murphy
> Sent: 22 April 2020 12:02
..
> Sure - I have a nagging feeling that it could still do better WRT
> pipelining the loads anyway, so I'm happy to come back and reconsider
> the local codegen later. It certainly doesn't deserve to stand in the
> way of cross-arch rework.

How fast does that loop actually run?
To my mind it seems to do a lot of operations on each 64bit value.
I'd have thought that a loop based on:
	sum64 = *ptr;
	sum64_high = *ptr++ >> 32;
and then fixing up the result would be faster.

The x86-64 code is also bad!
On intel cpu prior to haswell a simple:
	sum_64 += *ptr32++;
is faster than the current code.
(Although you can do a lot better even on ivy bridge.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)