Re: AF_XDP metadata/hints

Jesper Dangaard Brouer <brouer@xxxxxxxxxx> · Wed, 26 May 2021 20:44:18 +0200

On Wed, 26 May 2021 09:33:11 -0700
John Fastabend <john.fastabend@xxxxxxxxx> wrote:

> Alexander Lobakin wrote:
> > From: John Fastabend <john.fastabend@xxxxxxxxx>
> > Date: Wed, 26 May 2021 08:35:49 -0700
> > 
[...]
> > >> >
> > >> > I assume we need to stay compatible and respect the existing config
> > >> > interfaces, right?  
> > 
> > Again, XDP Hints won't change any netdev features and stuff, only
> > compose provide the hardware provided fields that are currently
> > inaccessible by the XDP prog and say cpumap code, but that are
> > highly needed (cpumap builds skbs without csums -> GRO layer
> > consumes CPU time to calculate it manually, without RSS hash ->
> > Flow Dissector consumes CPU time to calculate it manually +
> > possible NAPI bucket misses etc.).  
> 
> Thats a specific cpumap problem correct?

No, it is not a specific cpumap problem.  It is actually a general
XDP_REDIRECT problem.  The veth container use-case is also hit by this
slowdown due to lacking HW-csum and RSS-hash, as describe by Alexander.

It also exists for redirect into Virtual Machines, which is David
Ahern's use-case actually.

> In general checksums work as expected?

Nope, the checksums are non-existing for XDP_REDIRECT'ed packets.

> [...] 
> I'm not convinced hashes and csum are so interesting but show me some
> data. 

Checksum overhead measurements for veth container use-case see here[1].
 [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_frame01_checksum.org

> Also my admittedly rough understanding of cpumap is that it helps
> the case where hardware RSS is not sufficient. 

I feel the need to explain what I'm using cpumap for, so bear with me.

In xdp-cpumap-tc[2] the XDP cpumap redirect solves the TC Qdisc locking
problem.  This runs in production at an ISP that uses MQ+HTB shaping.
It makes sure customer assigned IP-addresses (can be multiple) are
redirected to the same CPU. Thus, the customer specific HTB shaper
works correctly. (Multiple HTB qdisc are attached under MQ [6]).

 [2] https://github.com/xdp-project/xdp-cpumap-tc
 [6] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/bin/tc_mq_htb_setup_example.sh

In traffic-pacing-edt[3] the cpumap code[4] maps VLAN tagged traffic to
the same CPU.  This allows the TC-BPF code[5] to be "concurrency
correct" as it updates the VLAN based EDT-rate-limit BPF-map without any
atomic operations.  This runs in production at another ISP, that need
to shape (traffic pace) 1Gbit/s customer on a 10Gbit/s link due to
crappy 1G GPON switches closer to the end-customer.  It would be useful
to get the offloaded VLAN info in XDP-metadata.

 [3] https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt
 [4] https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/xdp_cpumap_qinq.c
 [5] https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/edt_pacer_vlan.c

> Seeing your coming from the Intel hardware side why not fix the RSS
> root problem instead of using cpumap at all? I think your hardware is
> flexible enough.

Yes, please fix i40e hardware/firmware to support Q-in-Q packets.  We
are actaully hitting this at a customer site.  But my above cpumap
use-cases were not due to bad RSS-hashing.

> I would really prefer to see example use cases that are more generic
> than the cpumap case.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer