On 8/25/21 9:32 AM, Eric Dumazet wrote: > On Wed, Aug 25, 2021 at 9:29 AM David Ahern <dsahern@xxxxxxxxx> wrote: >> >> On 8/23/21 8:04 AM, Eric Dumazet wrote: >>>> >>>> >>>> It seems PAGE_ALLOC_COSTLY_ORDER is mostly related to pcp page, OOM, memory >>>> compact and memory isolation, as the test system has a lot of memory installed >>>> (about 500G, only 3-4G is used), so I used the below patch to test the max >>>> possible performance improvement when making TCP frags twice bigger, and >>>> the performance improvement went from about 30Gbit to 32Gbit for one thread >>>> iperf tcp flow in IOMMU strict mode, >>> >>> This is encouraging, and means we can do much better. >>> >>> Even with SKB_FRAG_PAGE_ORDER set to 4, typical skbs will need 3 mappings >>> >>> 1) One for the headers (in skb->head) >>> 2) Two page frags, because one TSO packet payload is not a nice power-of-two. >> >> interesting observation. I have noticed 17 with the ZC API. That might >> explain the less than expected performance bump with iommu strict mode. > > Note that if application is using huge pages, things get better after > > commit 394fcd8a813456b3306c423ec4227ed874dfc08b > Author: Eric Dumazet <edumazet@xxxxxxxxxx> > Date: Thu Aug 20 08:43:59 2020 -0700 > > net: zerocopy: combine pages in zerocopy_sg_from_iter() > > Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0 > fragments. > Compared to standard sendmsg(), these skbs usually contain up to > 16 fragments > on arches with 4KB page sizes, instead of two. > > This adds considerable costs on various ndo_start_xmit() handlers, > especially when IOMMU is in the picture. > > As high performance applications are often using huge pages, > we can try to combine adjacent pages belonging to same > compound page. > > Tested on AMD Rome platform, with IOMMU, nominal single TCP flow speed > is roughly doubled (~55Gbit -> ~100Gbit), when user application > is using hugepages. > > For reference, nominal single TCP flow speed on this platform > without MSG_ZEROCOPY is ~65Gbit. > > Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> > Cc: Willem de Bruijn <willemb@xxxxxxxxxx> > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> > > Ideally the gup stuff should really directly deal with hugepages, so > that we avoid > all these crazy refcounting games on the per-huge-page central refcount. > thanks for the pointer. I need to revisit my past attempt to get iperf3 working with hugepages.