On 8/23/21 8:04 AM, Eric Dumazet wrote: >> >> >> It seems PAGE_ALLOC_COSTLY_ORDER is mostly related to pcp page, OOM, memory >> compact and memory isolation, as the test system has a lot of memory installed >> (about 500G, only 3-4G is used), so I used the below patch to test the max >> possible performance improvement when making TCP frags twice bigger, and >> the performance improvement went from about 30Gbit to 32Gbit for one thread >> iperf tcp flow in IOMMU strict mode, > > This is encouraging, and means we can do much better. > > Even with SKB_FRAG_PAGE_ORDER set to 4, typical skbs will need 3 mappings > > 1) One for the headers (in skb->head) > 2) Two page frags, because one TSO packet payload is not a nice power-of-two. interesting observation. I have noticed 17 with the ZC API. That might explain the less than expected performance bump with iommu strict mode. > > The first issue can be addressed using a piece of coherent memory (128 > or 256 bytes per entry in TX ring). > Copying the headers can avoid one IOMMU mapping, and improve IOTLB > hits, because all > slots of the TX ring buffer will use one single IOTLB slot. > > The second issue can be solved by tweaking a bit > skb_page_frag_refill() to accept an additional parameter > so that the whole skb payload fits in a single order-4 page. > >