From: Willem de Bruijn > Sent: 23 September 2023 07:59 > > On Fri, Sep 22, 2023 at 2:01 PM David Howells <dhowells@xxxxxxxxxx> wrote: > > > > David Laight <David.Laight@xxxxxxxxxx> wrote: > > > > > > (8) Move the copy-and-csum code to net/ where it can be in proximity with > > > > the code that uses it. This eliminates the code if CONFIG_NET=n and > > > > allows for the slim possibility of it being inlined. > > > > > > > > (9) Fold memcpy_and_csum() in to its two users. > > > > > > > > (10) Move csum_and_copy_from_iter_full() out of line and merge in > > > > csum_and_copy_from_iter() since the former is the only caller of the > > > > latter. > > > > > > I thought that the real idea behind these was to do the checksum > > > at the same time as the copy to avoid loading the data into the L1 > > > data-cache twice - especially for long buffers. > > > I wonder how often there are multiple iov[] that actually make > > > it better than just check summing the linear buffer? > > > > It also reduces the overhead for finding the data to checksum in the case the > > packet gets split since we're doing the checksumming as we copy - but with a > > linear buffer, that's negligible. > > > > > I had a feeling that check summing of udp data was done during > > > copy_to/from_user, but the code can't be the copy-and-csum here > > > for that because it is missing support form odd-length buffers. > > > > Is there a bug there? No, I misread the code - i shouldn't scan patches when I'd got a viral head code... ... > > You may be right. That's more a question for the networking folks than for > > me. It's entirely possible that the checksumming code is just not used on > > modern systems these days. > > > > Maybe Willem can comment since he's the UDP maintainer? > > Perhaps these days it is more relevant to embedded systems than high > end servers. The checksum and copy are done together. I probably missed it because the function isn't passed the old checksum (which it can pretty much process for free). Instead the caller is adding it afterwards - which involves and extra explicit csum_add(). The x86-x84 ip checksum loops are all horrid though. The unrolling in them is so 1990's. With the out-of-order pipeline the memory accesses tend to take care of themselves. Not to mention that a whole raft of (now oldish) cpu take two clocks to execute 'adc'. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)