Re: [PATCH bpf-next] xsk: support AF_PACKET

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/28/21 12:54 PM, Toke Høiland-Jørgensen wrote:
Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes:
On 5/28/21 12:00 PM, Magnus Karlsson wrote:
On Fri, May 28, 2021 at 11:52 AM Jesper Dangaard Brouer
<brouer@xxxxxxxxxx> wrote:
On Fri, 28 May 2021 17:02:01 +0800
Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
On Fri, 28 May 2021 10:55:58 +0200, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> writes:

In xsk mode, users cannot use AF_PACKET(tcpdump) to observe the current
rx/tx data packets. This feature is very important in many cases. So
this patch allows AF_PACKET to obtain xsk packages.

You can use xdpdump to dump the packets from the XDP program before it
gets redirected into the XSK:
https://github.com/xdp-project/xdp-tools/tree/master/xdp-dump

Wow, this is a good idea.

Yes, it is rather cool (credit to Eelco).  Notice the extra info you
can capture from 'exit', like XDP return codes, if_index, rx_queue.

The tool uses the perf ring-buffer to send/copy data to userspace.
This is actually surprisingly fast, but I still think AF_XDP will be
faster (but it usually 'steals' the packet).

Another (crazy?) idea is to extend this (and xdpdump), is to leverage
Hangbin's recent XDP_REDIRECT extension e624d4ed4aa8 ("xdp: Extend
xdp_redirect_map with broadcast support").  We now have a
xdp_redirect_map flag BPF_F_BROADCAST, what if we create a
BPF_F_CLONE_PASS flag?

The semantic meaning of BPF_F_CLONE_PASS flag is to copy/clone the
packet for the specified map target index (e.g AF_XDP map), but
afterwards it does like veth/cpumap and creates an SKB from the
xdp_frame (see __xdp_build_skb_from_frame()) and send to netstack.
(Feel free to kick me if this doesn't make any sense)

This would be a smooth way to implement clone support for AF_XDP. If
we had this and someone added AF_XDP support to libpcap, we could both
capture AF_XDP traffic with tcpdump (using this clone functionality in
the XDP program) and speed up tcpdump for dumping traffic destined for
regular sockets. Would that solve your use case Xuan? Note that I have
not looked into the BPF_F_CLONE_PASS code, so do not know at this
point what it would take to support this for XSKMAPs.

Recently also ended up with something similar for our XDP LB to record pcaps [0] ;)
My question is.. tcpdump doesn't really care where the packet data comes from,
so why not extending libpcap's Linux-related internals to either capture from
perf RB or BPF ringbuf rather than AF_PACKET sockets? Cloning is slow, and if
you need to end up creating an skb which is then cloned once again inside AF_PACKET
it's even worse. Just relying and reading out, say, perf RB you don't need any
clones at all.

We discussed this when creating xdpdump and decided to keep it as a
separate tool for the time being. I forget the details of the
discussion, maybe Eelco remembers.

Anyway, xdpdump does have a "pipe pcap to stdout" feature so you can do
`xdpdump | tcpdump` and get the interactive output; and it will also
save pcap information to disk, of course (using pcap-ng so it can also
save metadata like XDP program name and return code).

Right, and this should yield a significantly better performance compared to
cloning & pushing traffic into AF_PACKET. I presume not many folks are aware
of xdpdump (yet) which is probably why such patch was created here.. a native
libpcap implementation could solve that aspect fwiw and additionally hook at
the same points as AF_PACKET via BPF but without the hassle/overhead of things
like dev_queue_xmit_nit() in fast path. (Maybe another option could be to have
a drop-in replacement libpcap.so for tcpdump using it transparently.)

Thanks,
Daniel



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux