On Sun, Oct 02, 2022 at 03:24:57PM -0700, Linus Torvalds wrote: > On Sun, Oct 2, 2022 at 3:09 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > Non-coherent DMA for networking is going to be fun, though. > > I agree that networking is likely the main performance issue, but I > suspect 99% of the cases would come from __alloc_skb(). The problem is not the allocation but rather having a generic enough dma_needs_bounce() check. It won't be able to tell whether some 1500 byte range is for network or for crypto code that uses a small ARCH_KMALLOC_MINALIGN. Getting the actual object size (e.g. with ksize()) doesn't tell the full story on how safe the DMA is. > Similarly, that code already has magic stuff to try to be > cacheline-aligned for accesses, but it's not really for DMA coherency > reasons, just purely for performance reasons (trying to make sure that > the header accesses stay in one cacheline etc). Yeah, __skb_alloc() ends up using SMP_CACHE_BYTES for data alignment (via SKB_DATA_ALIGN). I have a suspicion this may break on SoCs with a 128-byte cache line but I haven't seen any report yet (there aren't many such systems). -- Catalin