Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/29/20 12:53 AM, Willem de Bruijn wrote:
On Fri, Feb 28, 2020 at 2:01 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
On Fri, 28 Feb 2020 02:54:35 -0800 Luigi Rizzo wrote:
Add a netdevice flag to control skb linearization in generic xdp mode.

The attribute can be modified through
       /sys/class/net/<DEVICE>/xdpgeneric_linearize
The default is 1 (on)

Motivation: xdp expects linear skbs with some minimum headroom, and
generic xdp calls skb_linearize() if needed. The linearization is
expensive, and may be unnecessary e.g. when the xdp program does
not need access to the whole payload.
This sysfs entry allows users to opt out of linearization on a
per-device basis (linearization is still performed on cloned skbs).

On a kernel instrumented to grab timestamps around the linearization
code in netif_receive_generic_xdp, and heavy netperf traffic with 1500b
mtu, I see the following times (nanoseconds/pkt)

The receiver generally sees larger packets so the difference is more
significant.

ns/pkt                   RECEIVER                 SENDER

                     p50     p90     p99       p50   p90    p99

LINEARIZATION:    600ns  1090ns  4900ns     149ns 249ns  460ns
NO LINEARIZATION:  40ns    59ns    90ns      40ns  50ns  100ns

v1 --> v2 : added Documentation
v2 --> v3 : adjusted for skb_cloned
v3 --> v4 : renamed to xdpgeneric_linearize, documentation

Signed-off-by: Luigi Rizzo <lrizzo@xxxxxxxxxx>

Just load your program in cls_bpf. No extensions or knobs needed.

Making xdpgeneric-only extensions without touching native XDP makes
no sense to me. Is this part of some greater vision?

Yes, native xdp has the same issue when handling packets that exceed a
page (4K+ MTU) or otherwise consist of multiple segments. The issue is
just more acute in generic xdp. But agreed that both need to be solved
together.

Many programs need only access to the header. There currently is not a
way to express this, or for xdp to convey that the buffer covers only
part of the packet.

Right, my only question I had earlier was that when users ship their
application with /sys/class/net/<DEVICE>/xdpgeneric_linearize turned off,
how would they know how much of the data is actually pulled in? Afaik,
some drivers might only have a linear section that covers the eth header
and that is it. What should the BPF prog do in such case? Drop the skb
since it does not have the rest of the data to e.g. make a XDP_PASS
decision or fallback to tc/BPF altogether? I hinted earlier, one way to
make this more graceful is to add a skb pointer inside e.g. struct
xdp_rxq_info and then enable an bpf_skb_pull_data()-like helper e.g. as:

BPF_CALL_2(bpf_xdp_pull_data, struct xdp_buff *, xdp, u32, len)
{
        struct sk_buff *skb = xdp->rxq->skb;

        return skb ? bpf_try_make_writable(skb, len ? :
                                           skb_headlen(skb)) : -ENOTSUPP;
}

Thus, when the data/data_end test fails in generic XDP, the user can
call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
is needed w/o full linearization and once done the data/data_end can be
repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
later we could perhaps reuse the same bpf_xdp_pull_data() helper for
native with skb-less backing. Thoughts?

Thanks,
Daniel



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux