The following commit has been merged into the x86/core branch of tip: Commit-ID: 2a144bcd661c4f0a503e03f9280e88854ac0bb37 Gitweb: https://git.kernel.org/tip/2a144bcd661c4f0a503e03f9280e88854ac0bb37 Author: Eric Dumazet <edumazet@xxxxxxxxxx> AuthorDate: Thu, 25 Nov 2021 06:18:17 -08:00 Committer: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> CommitterDate: Tue, 30 Nov 2021 16:26:03 -08:00 x86/csum: Fix initial seed for odd buffers When I folded do_csum() into csum_partial(), I missed that we had to swap odd/even bytes from @sum argument. This is because this swap will happen again at the end of the function. [A, B, C, D] -> [B, A, D, C] As far as Internet checksums (rfc 1071) are concerned, we can instead rotate the whole 32bit value by 8 (or 24) -> [D, A, B, C] Note that I played with the idea of replacing this final swapping: result = from32to16(result); result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); With: result = ror32(result, 8); But while the generated code was definitely better for the odd case, run time cost for the more likely even case was not better for gcc. gcc is replacing a well predicted conditional branch with a cmov instruction after a ror instruction which adds a cost canceling the cmov gain. Many thanks to Noah Goldstein for reporting this issue. [ dhansen: * spelling: swaping => swapping * updated Fixes commit ] Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Fixes: d31c3c683ee6 ("x86/csum: Rewrite/optimize csum_partial()") Reported-by: Noah Goldstein <goldstein.w.n@xxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Link: https://lkml.kernel.org/r/20211125141817.3541501-1-eric.dumazet@xxxxxxxxx --- arch/x86/lib/csum-partial_64.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index 1eb8f2d..40b527b 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) if (unlikely(odd)) { if (unlikely(len == 0)) return sum; + temp64 = ror32((__force u32)sum, 8); temp64 += (*(unsigned char *)buff << 8); len--; buff++;
![]() |