Puranjay Mohan <puranjay@xxxxxxxxxx> writes: > 1. Optimization > ------------ > > The current implementation copies the 'from' and 'to' buffers to a > scratchpad and it takes the bitwise NOT of 'from' buffer while copying. > In the next step csum_partial() is called with this scratchpad. > > so, mathematically, the current implementation is doing: > > result = csum(to - from) > > Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need > it in the scratchpad buffer because csum_partial() takes a single > contiguous buffer and not two disjoint buffers like 'to' and 'from'. > > We can re write this equation to: > > result = csum(to) - csum(from) > > using the distributive property of csum(). > > this allows 'to' and 'from' to be at different locations and therefore > this scratchpad and copying is not needed. > > This in C code will look like: > > result = csum_sub(csum_partial(to, to_size, seed), > csum_partial(from, from_size, 0)); > > 2. Homogenization > -------------- > > The bpf_csum_diff() helper calls csum_partial() which is implemented by > some architectures like arm and x86 but other architectures rely on the > generic implementation in lib/checksum.c > > The generic implementation in lib/checksum.c returns a 16 bit value but > the arch specific implementations can return more than 16 bits, this > works out in most places because before the result is used, it is passed > through csum_fold() that turns it into a 16-bit value. > > bpf_csum_diff() directly returns the value from csum_partial() and > therefore the returned values could be different on different > architectures. see discussion in [1]: > > for the int value 28 the calculated checksums are: > > x86 : -29 : 0xffffffe3 > generic (arm64, riscv) : 65507 : 0x0000ffe3 > arm : 131042 : 0x0001ffe2 > > Pass the result of bpf_csum_diff() through from32to16() before returning > to homogenize this result for all architectures. > > NOTE: from32to16() is used instead of csum_fold() because csum_fold() > does from32to16() + bitwise NOT of the result, which is not what we want > to do here. > > [1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@xxxxxxxxxxxxxx/ > > Signed-off-by: Puranjay Mohan <puranjay@xxxxxxxxxx> Pretty neat simplification :) Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>