We need a way for csum_and_copy_{from,to}_user() to report faults. The approach taken back in 2020 (avoid 0 as return value by starting summing from ~0U, use 0 to report faults) had been broken; it does yield the right value modulo 2^16-1, but the case when data is entirely zero-filled is not handled right. It almost works, since for most of the codepaths we have a non-zero value added in and there 0 is not different from anything divisible by 0xffff. However, there are cases (ICMPv4 replies, for example) where we are not guaranteed that. In other words, we really need to have those primitives return 0 on filled-with-zeroes input. So let's make them return a 64bit value instead; we can do that cheaply (all supported architectures do that via a couple of registers) and we can use that to report faults without disturbing the 32bit csum. New type: __wsum_fault. 64bit, returned by csum_and_copy_..._user(). Primitives: * CSUM_FAULT representing the fault * to_wsum_fault() folding __wsum value into that * from_wsum_fault() extracting __wsum value * wsum_is_fault() checking if it's a fault value Representation depends upon the target. CSUM_FAULT: ~0ULL to_wsum_fault(v32): (u64)v32 for 64bit and 32bit l-e, (u64)v32 << 32 for 32bit b-e. Rationale: relationship between the calling conventions for returning 64bit and those for returning 32bit values. On 64bit architectures the same register is used; on 32bit l-e the lower half of the value goes in the same register that is used for returning 32bit values and the upper half goes into additional register. On 32bit b-e the opposite happens - upper 32 bits go into the register used for returning 32bit values and the lower 32 bits get stuffed into additional register. So with this choice of representation we need minimal changes on the asm side (zero an extra register in 32bit case, nothing in 64bit case), and from_wsum_fault() is as cheap as it gets. Sum calculation is back to "start from 0". The rest of the series consists of cleaning up assorted asm/checksum.h. Branch lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.csum Individual patches in followups. Help with review and testing would be very welcome. Al Viro (18): make net/checksum.h self-contained get rid of asm/checksum.h includes outside of include/net/checksum.h and arch make net/checksum.h the sole user of asm/checksum.h Fix the csum_and_copy_..._user() idiocy bits missing from csum_and_copy_{from,to}_user() unexporting. consolidate csum_tcpudp_magic(), take default variant into net/checksum.h consolidate default ip_compute_csum() alpha: pull asm-generic/checksum.h mips: pull include of asm-generic/checksum.h out of #if nios2: pull asm-generic/checksum.h x86: merge csum_fold() for 32bit and 64bit x86: merge ip_fast_csum() for 32bit and 64bit x86: merge csum_tcpudp_nofold() for 32bit and 64bit amd64: saner handling of odd address in csum_partial() x86: optimized csum_add() is the same for 32bit and 64bit x86: lift the extern for csum_partial() into checksum.h x86_64: move csum_ipv6_magic() from csum-wrappers_64.c to csum-partial_64.c uml/x86: use normal x86 checksum.h arch/alpha/include/asm/asm-prototypes.h | 2 +- arch/alpha/include/asm/checksum.h | 68 ++---------- arch/alpha/lib/csum_partial_copy.c | 74 ++++++------- arch/arm/include/asm/checksum.h | 27 +---- arch/arm/kernel/armksyms.c | 3 +- arch/arm/lib/csumpartialcopygeneric.S | 3 +- arch/arm/lib/csumpartialcopyuser.S | 8 +- arch/hexagon/include/asm/checksum.h | 4 +- arch/hexagon/kernel/hexagon_ksyms.c | 1 - arch/hexagon/lib/checksum.c | 1 + arch/m68k/include/asm/checksum.h | 24 +--- arch/m68k/lib/checksum.c | 8 +- arch/microblaze/kernel/microblaze_ksyms.c | 2 +- arch/mips/include/asm/asm-prototypes.h | 2 +- arch/mips/include/asm/checksum.h | 32 ++---- arch/mips/lib/csum_partial.S | 12 +- arch/nios2/include/asm/checksum.h | 13 +-- arch/openrisc/kernel/or32_ksyms.c | 2 +- arch/parisc/include/asm/checksum.h | 21 ---- arch/powerpc/include/asm/asm-prototypes.h | 2 +- arch/powerpc/include/asm/checksum.h | 27 +---- arch/powerpc/lib/checksum_32.S | 6 +- arch/powerpc/lib/checksum_64.S | 4 +- arch/powerpc/lib/checksum_wrappers.c | 14 +-- arch/s390/include/asm/checksum.h | 18 --- arch/s390/kernel/ipl.c | 2 +- arch/s390/kernel/os_info.c | 2 +- arch/sh/include/asm/checksum_32.h | 32 +----- arch/sh/kernel/sh_ksyms_32.c | 2 +- arch/sh/lib/checksum.S | 6 +- arch/sparc/include/asm/asm-prototypes.h | 2 +- arch/sparc/include/asm/checksum_32.h | 63 ++++++----- arch/sparc/include/asm/checksum_64.h | 21 +--- arch/sparc/lib/checksum_32.S | 2 +- arch/sparc/lib/csum_copy.S | 4 +- arch/sparc/lib/csum_copy_from_user.S | 2 +- arch/sparc/lib/csum_copy_to_user.S | 2 +- arch/x86/include/asm/asm-prototypes.h | 2 +- arch/x86/include/asm/checksum.h | 177 ++++++++++++++++++++++++++++++ arch/x86/include/asm/checksum_32.h | 141 ++---------------------- arch/x86/include/asm/checksum_64.h | 172 +---------------------------- arch/x86/lib/checksum_32.S | 20 +++- arch/x86/lib/csum-copy_64.S | 6 +- arch/x86/lib/csum-partial_64.c | 41 ++++--- arch/x86/lib/csum-wrappers_64.c | 43 ++------ arch/x86/um/asm/checksum.h | 119 -------------------- arch/x86/um/asm/checksum_32.h | 38 ------- arch/x86/um/asm/checksum_64.h | 19 ---- arch/xtensa/include/asm/asm-prototypes.h | 2 +- arch/xtensa/include/asm/checksum.h | 33 +----- arch/xtensa/lib/checksum.S | 6 +- drivers/net/ethernet/brocade/bna/bnad.h | 2 - drivers/net/ethernet/lantiq_etop.c | 2 - drivers/net/vmxnet3/vmxnet3_int.h | 1 - drivers/s390/char/zcore.c | 2 +- include/asm-generic/checksum.h | 15 +-- include/net/checksum.h | 81 ++++++++++++-- include/net/ip6_checksum.h | 1 - lib/checksum_kunit.c | 2 +- net/core/datagram.c | 8 +- net/core/skbuff.c | 8 +- net/ipv6/ip6_checksum.c | 1 - 62 files changed, 501 insertions(+), 959 deletions(-) Part 1: sorting out the includes. We have asm/checksum.h and net/checksum.h; the latter pulls the former. A lot of things would become easier if we could move the things from asm/checksum.h to net/checksum.h; for that we need to make net/checksum.h the only file that pulls asm/checksum.h. 1/18) make net/checksum.h self-contained right now it has an implicit dependency upon linux/bitops.h (for the sake of ror32()). 2/18) get rid of asm/checksum.h includes outside of include/net/checksum.h and arch In almost all cases include is redundant; zcore.c and checksum_kunit.c are the sole exceptions and those got switched to net/checksum.h 3/18) make net/checksum.h the sole user of asm/checksum.h All other users (all in arch/* by now) can pull net/checksum.h. Part 2: fix the fault reporting. 4/18) Fix the csum_and_copy_..._user() idiocy Fix the breakage introduced back in 2020 - see above for details. Part 3: trimming related crap 5/18) bits missing from csum_and_copy_{from,to}_user() unexporting. 6/18) consolidate csum_tcpudp_magic(), take default variant into net/checksum.h 7/18) consolidate default ip_compute_csum() ... and take it into include/net/checksum.h 8/18) alpha: pull asm-generic/checksum.h 9/18) mips: pull include of asm-generic/checksum.h out of #if 10/18) nios2: pull asm-generic/checksum.h Part 4: trimming x86 crap 11/18) x86: merge csum_fold() for 32bit and 64bit identical... 12/18) x86: merge ip_fast_csum() for 32bit and 64bit Identical, except that 32bit version uses asm volatile where 64bit one uses plain asm. The former had become pointless when memory clobber got added to both versions... 13/18) x86: merge csum_tcpudp_nofold() for 32bit and 64bit identical... 14/18) amd64: saner handling of odd address in csum_partial() all we want there is to have return value congruent to result * 256 modulo 0xffff; no need to convert from 32bit to 16bit (i.e. take it modulo 0xffff) first - cyclic shift of 32bit value by 8 bits (in either direction) will work. Kills the from32to16() helper and yields better code... 15/18) x86: optimized csum_add() is the same for 32bit and 64bit 16/18) x86: lift the extern for csum_partial() into checksum.h 17/18) x86_64: move csum_ipv6_magic() from csum-wrappers_64.c to csum-partial_64.c ... and make uml/amd64 use it. 18/18) uml/x86: use normal x86 checksum.h The only difference left is that UML really does *NOT* want the csum-and-uaccess combinations; leave those in arch/x86/include/asm/checksum_{32,64}, move the rest into arch/x86/include/asm/checksum.h (under ifdefs) and that's pretty much it.