From: Eric Dumazet > Sent: 04 December 2024 14:36 ... > I would suggest the opposite : copy the headers (typically less than > 128 bytes) on a piece of coherent memory. A long time ago a colleague tested the cutoff between copying to a fixed buffer and dma access to the kernel memory buffer for a sparc mbus/sbus system (which has an iommu). While entirely different in all regards the cutoff was just over 1k. The ethernet drivers I wrote did a data copy to/from a pre-mapped area for both transmit and receive. I suspect the simplicity of that also improved things. These days you'd definitely want to map tso buffers. But the 'copybreak' size for receive could be quite high. On x86 just make sure the destination address for 'rep movsb' is 64 byte aligned - it will double the copy speed. The source alignment doesn't matter at all. (AMD chips might be different, but an aligned copy of a whole number of 'words' can always be done.) I've also wondered whether the ethernet driver could 'hold' the iommu page table entries after (eg) a receive frame is processed and then drop the PA of the replacement buffer into the same slot. That is likely to speed up iommu setup. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)