Re: AF_XDP umem and jumbo frames?

Rob Sherwood <rob.sherwood@xxxxxxxxx> · Sat, 6 Oct 2018 13:02:37 -0700

Thanks Jesper for the detailed and thoughtful answer!

I definitely understand that XDP-based applications significantly
reduce per packet overhead and thus XDP-based applications don't need
jumbo frames.  I guess I was thinking more about the performance of
non-XDP applications that still benefit from jumbo frames.  A lot of
the (at least to me) interesting applications of XDP are as
infrastructure to other applications, e.g., XDP-based NATs, firewalls,
load-balancers, etc. that pass packets generated by non-XDP traffic
from web servers, streaming video, etc.  But if those other
applications benefit from jumbo frames to work efficiently, then if we
want to use an XDP-based application, then they can't use jumbo frames
either.  Does that make sense?

I admit that part of my thinking (implicit assumption) is that I've
never seen IP fragmentation work efficiently (either on the
fragmentation or defragmentation side), but I guess it's in theory
possible that an application could source a jumbo framed packet and
something else could fragment it and then an XDP-app could manage it.
Is this more along the lines of what you were thinking?

Thanks again for the detailed reply!

- Rob
.
On Thu, Oct 4, 2018 at 12:44 PM Jesper Dangaard Brouer
<brouer@xxxxxxxxxx> wrote:
>
> On Thu, 4 Oct 2018 08:47:45 -0700
> Rob Sherwood <rob.sherwood@xxxxxxxxx> wrote:
>
> > [not speaking for my current employer, but just from past experience ]
> >
> > Certainly a lot of the 'hard' requirements (hard meaning - "without
> > this it won't work")  I've seen could be served with a ~3k non-full
> > jumbo frame.
>
> Glad to hear that _most_ use-cases can be solved with a ~3k non-full
> jumbo-frame.
>
> > But at least what I've seen in the past was that because
> > many of the host-side operations are per-packet limited (e.g., because
> > of CPU or RAM, but ultimately turns into a max pps per host), a
> > trivial way to increase application performance/reduce CPU for
> > networking was to run at as large a frame size as possible.  For
> > example, if your application/host is really pps limited, then getting
> > the frame size to increase from 3k to 9k means either 3x more
> > bandwidth for the same cpu usage (assuming the application is
> > bandwidth limited) or 1/3x the CPU usage for the same bandwidth (if
> > the application is not bandwidth limited).  Either way, IMHO it's a
> > pretty big win.
>
> With XDP we have basically solved the issue of being PPS (packets per
> sec) limited.  And we can avoid these workarounds of using jumbo frames.
> That is why it is a bit provoking to ask for jumbo-frames ;-)
>
>
> People on this list might not realize that there is a significant
> overhead in supporting larger that 4K frames for XDP, that is larger
> than one memory-page. So let me explain...
>
> It is actually trivially easy for XDP to support jumbo frames, if the
> NIC hardware supports storing RX frames into higher order pages (aka
> compound pages, more 4K pages physically after each-other) which most
> HW does. (Page order0 = 4KB, order1=8KB, order2=16KB, order3=32KB).
> As then XDP will work out-of-the-box, as the requirement is really that
> packet-payload is layout as phys continuous memory.
>
> Kernel page allocator can give us high-order pages, sure, but is cost
> more, see slide 12 of [1].  The large jump to order-1, is because
> order-0 have a Per-Cpu-Pages (PCP) cache.  From order-1 and above, the
> page allocator goes through a central (per NUMA) lock, which makes
> thing even worse, as this does not scale to multiple CPUs.  And there
> is also the point of wasting memory when processing 64Byte packets.
> So, it is not 100% of the picture, that we could support jumbo-frames
> for XDP.  Mostly because we can workaround this cost/issue, by having
> recycle caches for these pages, which we even do for order-0 pages.
> Hint, I actually left this door open, as you can specify page-order
> when setting up the page_pool API in the driver...
>
> [1] http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>
>
>
>
> > On Thu, Oct 4, 2018 at 12:52 AM Jesper Dangaard Brouer
> > <brouer@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 4 Oct 2018 08:44:27 +0200
> > > Björn Töpel <bjorn.topel@xxxxxxxxx> wrote:
> > >
> > > > Den tors 27 sep. 2018 kl 02:56 skrev Rob Sherwood <rob.sherwood@xxxxxxxxx>:
> > > > >
> > > > > Thanks for the reference and the page-per-packet point makes sense.
> > > > > At the same time, not supporting jumbo frames seems like a non-trivial
> > > > > limitation.  Are there a subset of drivers that do support jumbo
> > > > > frames (or LRO or the other features that require multiple pages per
> > > > > packet)?
> > > > >
> > > >
> > > > No, not at the moment. XDP has a strict "one frame cannot exceed a
> > > > page" constraint. Everything that applies to XDP in terms of
> > > > constraints, applies to AF_XDP as well.
> > > >
> > > > Just to clarify, XDP supports jumbo frames -- i.e. larger than 1500B
> > > > payload, just not the maximum 9000B size. My personal observation is
> > > > that many deployments that "require jumbo frames", are usually OK with
> > > > an of MTU ~3000B. Jumbo frames, yes. Full jumbo frames, no. :-)
> > >
> > > Thank you for clarifying that Bjørn.
> > >
> > > Can Alex or Rob explain:
> > >
> > > (1) What is your use-case for wanting jumbo-frames?
> > >
> > > And (2) will an MTU of ~3000Bytes be sufficient? (which XDP does support)
> > >
> > >
> > > > > On Tue, Sep 25, 2018 at 9:44 AM Alex Forster <aforster@xxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > On my test box running 4.18 if XDP is in use the MTU can not be
> > > > > > > set higher than 3050.
> > > > > >
> > > > > > Ah, that answers a few questions for me. Thanks!
> > > > > >
> > > > > > Alex Forster
> > >
> > > --
> > > Best regards,
> > >   Jesper Dangaard Brouer
> > >   MSc.CS, Principal Kernel Engineer at Red Hat
> > >   LinkedIn: http://www.linkedin.com/in/brouer
>
>