On Sun, Oct 2, 2022 at 3:09 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > Non-coherent DMA for networking is going to be fun, though. I agree that networking is likely the main performance issue, but I suspect 99% of the cases would come from __alloc_skb(). You might want to have help from the network drivers for the "allocate for RX vs TX", since it ends up having very different DMA coherence issues, as you point out. The code actually already has a SKB_ALLOC_RX flag, but despite the name it doesn't really mean what you'd think it means. Similarly, that code already has magic stuff to try to be cacheline-aligned for accesses, but it's not really for DMA coherency reasons, just purely for performance reasons (trying to make sure that the header accesses stay in one cacheline etc). And to be honest, it's been years and years since I did any networking, so... Linus