Re: AF_XDP integration with FDio VPP? (Was: Questions about XDP)

William Tu <u9012063@xxxxxxxxx> · Fri, 23 Aug 2019 06:07:25 -0700

Hi Marek,

Answer some of your questions below, I leave the rest for others.

On Fri, Aug 23, 2019 at 3:38 AM Marek Závodský
<marek.zavodsky@xxxxxxxxxxxxx> wrote:
>
> Hi Jasper,
>
>
> Thanks for your reply.
>
> I apologize, I'm new to kernel dev, so I may be missing some background.
>
>
> Let's bring some more light into this. We are using kernel 5.0.0 and samples/bpf/xdpsock as an example.

Do you want to consider using AF_XDP API from libbpf?

The samples/bpf/xdpsock_user.c in 5.0.0 still not uses libbpf
https://elixir.bootlin.com/linux/v5.0/source/samples/bpf/xdpsock_user.c

kernel 5.1 xdpsock uses libbpf
https://elixir.bootlin.com/linux/v5.1/source/samples/bpf/xdpsock_user.c

>
> I checked master, and example evolved (e.g. by adding cleanup mechanisms), but in terms what I need of it, it looks equal (and even more complicated, because now XDP attaching to interface is interleaved with XSK allocation).
>
> I built latest kernel, but it refused to boot, so I haven't had chance yet to tray the latest.

Recently there are some fixes, I would suggest using the latest one.

>
>
> I took the _user part and split it into two:
>
> "loader" -  Executed once to setup environment and once to cleanup, loads _kern.o, attaches it to interface and pin maps under /sys/fs/bpf.
>
> and
>
> "worker" - Executed as many as required. Every instance loads maps from /sys/fs/bpf, create one AF_XDP sock, update xsks record and start listen/process packets from AF_XDP (in test scenario we are using l2fwd because of write-back). I had to add missing cleanups there( close(fd), munmap()). This should be vpp in final solution.
>
> So far so good.
>
> I'm unable to start more than one worker due to previously mentioned error. First instance works properly, every other fails on bind (lineno may not match due to local changes):
>
> xdpsock_user.c:xsk_configure:595: Assertion failed: bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0: errno: 16/"Device or resource busy"
>
>
I don't think you can have multiple threads binding one XSK, see
xsk_bind() in kernel source.
For AF_XDP in OVS, we create multiple XSKs, non-shared umem and each
has its thread.

> I modified it to allocate multiple sockets within one process, and I was successful with shared umem:
>
> num_socks = 0;
>
> xsks[num_socks++] = xsk_configure(NULL);
> for (; num_socks < opt_alloc; num_socks++)
>         xsks[num_socks] = xsk_configure(xsks[0]->umem);
>
>
> but got same behavior (first ok, second failed on bind) when tried non-shared:
>
> num_socks = 0;
>
> for (; num_socks < opt_alloc; num_socks++)
>
>       xsks[num_socks] = xsk_configure(NULL);
>
I never try shared-umem, I would suggest start with non-shared case.

Regards,
William
>
> And the TX processing... as a workaround we moved VLAN pop/push to "worker" and XDP does only xsk-map redirects based on vlan-id, but it violates the design. It there any estimate when we could expect something on XDP TX front? I can try BPF TC TX meantime.
>
>
> I guess changing opt_ifindex to xsk->fd in bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags);
>
> won't help :)
>
>
> One side question. I noticed that bpf_trace_printk creates sparse entries in /sys/kernel/debug/tracing/trace.
>
> When I run sample of 100 packets I may get 0 to many entries there. I't a bit annoying to run "load test" just to verify I hit the correct code path. Is it doing sampling? Can I tweak it somehow?
>
>
> Thanks,
>
> marek
>
> ________________________________
> From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> Sent: Friday, August 23, 2019 10:22:24 AM
> To: brouer@xxxxxxxxxx; Július Milan; Marek Závodský
> Cc: xdp-newbies@xxxxxxxxxxxxxxx; Karlsson, Magnus; Björn Töpel; Eelco Chaudron; Thomas F Herbert; William Tu
> Subject: AF_XDP integration with FDio VPP? (Was: Questions about XDP)
>
>
> Bringing these questions to the xdp-newbies list, where they belong.
> Answers inlined below.
>
> On Tue, 20 Aug 2019 21:17:57 +0200 Július Milan <Julius.Milan@xxxxxxxxxxxxx>
> >
> > I am writing AF_XDP driver for FDio VPP. I have 2 questions.
> >
>
> That sounds excellent.  I was hoping someone would do this for FDio VPP.
> Do notice that DPDK now also got AF_XDP support.  IHMO it makes a lot
> of sense to implement AF_XDP for FDio, and avoid the DPDK dependency.
> (AFAIK FDio already got other back-ends than DPDK).
>
>
> > 1 - I created a simple driver according to sample in kernel. I load my XDP
> > program and pin the maps.
> >
> >   Then in user application I create a socket, mmap the memory and
> > push it to xskmap in program. All fine yet.
> >
> >   Then I start another instance of user application and do the
> > same, create socket, mmap the memory and trying to
> >
> >   push it somewhere else into the map. But I got  errno: 16
> > "Device or resource busy" when trying to bind.
> >
> >   I guess the memory can’t be mmaped 2 times, but should be
> > shared, is that correct?
>
> I'm cc'ing the AF_XDP experts, as I'm not sure myself.  I mostly deal
> with the in-kernel XDP path.  (AF_XDP is essentially kernel bypass :-O)
>
>
> >   If so, I am wondering how to solve this nicely.
> >
> >   Can I store the value of first socket (that created the mmaped
> > memory) in some special map in my XDP program to avoid complicated
> > inter-process communication?
> >
> >   And what happens if this first socket is closed while any other
> > sockets are still alive (using its shared mmaped memory)?
> >
> >   What would you recommend? Maybe you have some sample.
>
> We just added a sample (by Eelco Cc'ed) into XDP-tutorial:
>  https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP
>
> At-least read the README.org file... to get over the common gotchas.
>
> AFAIK the sample doesn't cover your use-case.  I guess, we/someone
> should extend the sample, to illustrate how how multiple interfaces can
> share the same UMEM.
>
> The official documentation is:
>  https://www.kernel.org/doc/html/latest/networking/af_xdp.html
>
>
> >   Can I do also atomic operations? (I want it just for such rare
> > cases as initialization of next socket, to check if there already is
> > one, that mmaped the memory)
> >
> >
> >
> > 2 – We want to do also some decap/encap on XDP layer, before
> > redirecting it to the socket.
> >
>
> Decap on XDP layer is an excellent use-case, that demonstrate
> cooperation between XDP and AF_XDP kernel-bypass facility.
>
>
> >   On RX way it is easy, I do what I want and redirect it to the
> > socket, but can I achieve the same also on TX?
> >
>
> (Yes, RX case is easy)
>
> We don't have an XDP TX hook yet... but so many people have requested
> this, that we should add this.
>
> >   Can I catch the packet while TX in XDP and do something with it
> > (encapsulate it) before sending it out?
>
> Usually, we recommend people use the TC egress BPF hook to do the encap
> in TX.  For the AF_XDP use-case, the TC hook isn't there... so that is
> not an option.  Again an argument for an XDP-TX hook.  You, could
> of-cause add the encap header in your AF_XDP userspace program, but I
> do understand it would make architectural sense that in-kernel XDP
> would act as a decap/encap layer.
>
>
> >   If so what about performance?
> >
>
> For AF_XDP RX-side is really really fast, even in copy-mode.
>
> For AF_XDP TX-side in copy-mode, it is rather slow, as it allocates
> SKBs etc.  We could optimize this further but we have not.  When
> enabling AF_XDP zero-copy mode, the TX-side is also super fast.
>
> Another hint, for AF_XDP TX-side, remember to "produce" several packets
> before doing the sendmsg system call.  Thus, effectively doing bulking
> on the TX-ring.
>
>
> >
> > By the way, great job with XDP ;)
>
> Thanks!
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer