On 2/17/24 04:00, Charlie Jenkins wrote:
On Fri, Feb 16, 2024 at 01:38:51PM +0100, Helge Deller wrote:
* Guenter Roeck <linux@xxxxxxxxxxxx>:
hppa 64-bit systems calculates the IPv6 checksum using 64-bit add
operations. The last add folds protocol and length fields into the 64-bit
result. While unlikely, this operation can overflow. The overflow can be
triggered with a code sequence such as the following.
/* try to trigger massive overflows */
memset(tmp_buf, 0xff, sizeof(struct in6_addr));
csum_result = csum_ipv6_magic((struct in6_addr *)tmp_buf,
(struct in6_addr *)tmp_buf,
0xffff, 0xff, 0xffffffff);
Fix the problem by adding any overflows from the final add operation into
the calculated checksum. Fortunately, we can do this without additional
cost by replacing the add operation used to fold the checksum into 32 bit
with "add,dc" to add in the missing carry.
Gunter,
Thanks for your patch for 32- and 64-bit systems.
But I think it's time to sunset the parisc inline assembly for ipv4/ipv6
checksumming.
The patch below converts the code to use standard C-coding (taken from the
s390 header file) and it survives the v8 lib/checksum patch.
Opinions?
[...]
We can do better than this! By inspection this looks like a performance
regression.
[...]
Similar story again here where the add with carry is not well translated
into C, resulting in significantly worse assembly. Using __u64 seems to
be a big contributing factor for why the 32-bit assembly is worse.
I am not sure the best way to represent this in a clean way in C.
add with carry is not well understood by GCC 12.3 it seems. These
functions are generally heavily optimized on every architecture, so I
think it is worth it to default to assembly if you aren't able to
achieve comparable performance in C.
Thanks a lot for your analysis!!!
I've now reverted my change to switch to generic code and applied
the 3 suggested patches from Guenter which fix the hppa assembly.
Let's see how they behave in the for-next git tree during the next few days.
Helge