Re: AF_XDP integration with FDio VPP? (Was: Questions about XDP)

Björn Töpel <bjorn.topel@xxxxxxxxx> · Fri, 23 Aug 2019 15:42:31 +0200

On 2019-08-23 15:07, William Tu wrote:

Hi Marek,

Answer some of your questions below, I leave the rest for others.

On Fri, Aug 23, 2019 at 3:38 AM Marek Závodský
<marek.zavodsky@xxxxxxxxxxxxx> wrote:

Hi Jasper,

Thanks for your reply.

I apologize, I'm new to kernel dev, so I may be missing some background.

Let's bring some more light into this. We are using kernel 5.0.0 and samples/bpf/xdpsock as an example.

Do you want to consider using AF_XDP API from libbpf?

The samples/bpf/xdpsock_user.c in 5.0.0 still not uses libbpf
https://elixir.bootlin.com/linux/v5.0/source/samples/bpf/xdpsock_user.c

kernel 5.1 xdpsock uses libbpf
https://elixir.bootlin.com/linux/v5.1/source/samples/bpf/xdpsock_user.c

I checked master, and example evolved (e.g. by adding cleanup mechanisms), but in terms what I need of it, it looks equal (and even more complicated, because now XDP attaching to interface is interleaved with XSK allocation).

I built latest kernel, but it refused to boot, so I haven't had chance yet to tray the latest.

Recently there are some fixes, I would suggest using the latest one.

I took the _user part and split it into two:

"loader" -  Executed once to setup environment and once to cleanup, loads _kern.o, attaches it to interface and pin maps under /sys/fs/bpf.

and

"worker" - Executed as many as required. Every instance loads maps from /sys/fs/bpf, create one AF_XDP sock, update xsks record and start listen/process packets from AF_XDP (in test scenario we are using l2fwd because of write-back). I had to add missing cleanups there( close(fd), munmap()). This should be vpp in final solution.

So far so good.

I'm unable to start more than one worker due to previously mentioned error. First instance works properly, every other fails on bind (lineno may not match due to local changes):

xdpsock_user.c:xsk_configure:595: Assertion failed: bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0: errno: 16/"Device or resource busy"

I don't think you can have multiple threads binding one XSK, see
xsk_bind() in kernel source.
For AF_XDP in OVS, we create multiple XSKs, non-shared umem and each
has its thread.

I modified it to allocate multiple sockets within one process, and I was successful with shared umem:

num_socks = 0;

xsks[num_socks++] = xsk_configure(NULL);
for (; num_socks < opt_alloc; num_socks++)
         xsks[num_socks] = xsk_configure(xsks[0]->umem);

but got same behavior (first ok, second failed on bind) when tried non-shared:

num_socks = 0;

for (; num_socks < opt_alloc; num_socks++)

       xsks[num_socks] = xsk_configure(NULL);

I never try shared-umem, I would suggest start with non-shared case.

William did a much better job, than I would do, at answering the
questions. Thank you! +1 to all replies.

Cheers,
Björn

Regards,
William

And the TX processing... as a workaround we moved VLAN pop/push to "worker" and XDP does only xsk-map redirects based on vlan-id, but it violates the design. It there any estimate when we could expect something on XDP TX front? I can try BPF TC TX meantime.

I guess changing opt_ifindex to xsk->fd in bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags);

won't help :)

One side question. I noticed that bpf_trace_printk creates sparse entries in /sys/kernel/debug/tracing/trace.

When I run sample of 100 packets I may get 0 to many entries there. I't a bit annoying to run "load test" just to verify I hit the correct code path. Is it doing sampling? Can I tweak it somehow?

Thanks,

marek

________________________________
From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
Sent: Friday, August 23, 2019 10:22:24 AM
To: brouer@xxxxxxxxxx; Július Milan; Marek Závodský
Cc: xdp-newbies@xxxxxxxxxxxxxxx; Karlsson, Magnus; Björn Töpel; Eelco Chaudron; Thomas F Herbert; William Tu
Subject: AF_XDP integration with FDio VPP? (Was: Questions about XDP)

Bringing these questions to the xdp-newbies list, where they belong.
Answers inlined below.

On Tue, 20 Aug 2019 21:17:57 +0200 Július Milan <Julius.Milan@xxxxxxxxxxxxx>

I am writing AF_XDP driver for FDio VPP. I have 2 questions.

That sounds excellent.  I was hoping someone would do this for FDio VPP.
Do notice that DPDK now also got AF_XDP support.  IHMO it makes a lot
of sense to implement AF_XDP for FDio, and avoid the DPDK dependency.
(AFAIK FDio already got other back-ends than DPDK).

1 - I created a simple driver according to sample in kernel. I load my XDP
program and pin the maps.

   Then in user application I create a socket, mmap the memory and
push it to xskmap in program. All fine yet.

   Then I start another instance of user application and do the
same, create socket, mmap the memory and trying to

   push it somewhere else into the map. But I got  errno: 16
"Device or resource busy" when trying to bind.

   I guess the memory can’t be mmaped 2 times, but should be
shared, is that correct?

I'm cc'ing the AF_XDP experts, as I'm not sure myself.  I mostly deal
with the in-kernel XDP path.  (AF_XDP is essentially kernel bypass :-O)

   If so, I am wondering how to solve this nicely.

   Can I store the value of first socket (that created the mmaped
memory) in some special map in my XDP program to avoid complicated
inter-process communication?

   And what happens if this first socket is closed while any other
sockets are still alive (using its shared mmaped memory)?

   What would you recommend? Maybe you have some sample.

We just added a sample (by Eelco Cc'ed) into XDP-tutorial:
  https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP

At-least read the README.org file... to get over the common gotchas.

AFAIK the sample doesn't cover your use-case.  I guess, we/someone
should extend the sample, to illustrate how how multiple interfaces can
share the same UMEM.

The official documentation is:
  https://www.kernel.org/doc/html/latest/networking/af_xdp.html

   Can I do also atomic operations? (I want it just for such rare
cases as initialization of next socket, to check if there already is
one, that mmaped the memory)

2 – We want to do also some decap/encap on XDP layer, before
redirecting it to the socket.

Decap on XDP layer is an excellent use-case, that demonstrate
cooperation between XDP and AF_XDP kernel-bypass facility.

   On RX way it is easy, I do what I want and redirect it to the
socket, but can I achieve the same also on TX?

(Yes, RX case is easy)

We don't have an XDP TX hook yet... but so many people have requested
this, that we should add this.

   Can I catch the packet while TX in XDP and do something with it
(encapsulate it) before sending it out?

Usually, we recommend people use the TC egress BPF hook to do the encap
in TX.  For the AF_XDP use-case, the TC hook isn't there... so that is
not an option.  Again an argument for an XDP-TX hook.  You, could
of-cause add the encap header in your AF_XDP userspace program, but I
do understand it would make architectural sense that in-kernel XDP
would act as a decap/encap layer.

   If so what about performance?

For AF_XDP RX-side is really really fast, even in copy-mode.

For AF_XDP TX-side in copy-mode, it is rather slow, as it allocates
SKBs etc.  We could optimize this further but we have not.  When
enabling AF_XDP zero-copy mode, the TX-side is also super fast.

Another hint, for AF_XDP TX-side, remember to "produce" several packets
before doing the sendmsg system call.  Thus, effectively doing bulking
on the TX-ring.

By the way, great job with XDP ;)

Thanks!

--
Best regards,
   Jesper Dangaard Brouer
   MSc.CS, Principal Kernel Engineer at Red Hat
   LinkedIn: http://www.linkedin.com/in/brouer