Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes: > On 5/28/21 12:54 PM, Toke Høiland-Jørgensen wrote: >> Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes: >>> On 5/28/21 12:00 PM, Magnus Karlsson wrote: >>>> On Fri, May 28, 2021 at 11:52 AM Jesper Dangaard Brouer >>>> <brouer@xxxxxxxxxx> wrote: >>>>> On Fri, 28 May 2021 17:02:01 +0800 >>>>> Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote: >>>>>> On Fri, 28 May 2021 10:55:58 +0200, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >>>>>>> Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> writes: >>>>>>> >>>>>>>> In xsk mode, users cannot use AF_PACKET(tcpdump) to observe the current >>>>>>>> rx/tx data packets. This feature is very important in many cases. So >>>>>>>> this patch allows AF_PACKET to obtain xsk packages. >>>>>>> >>>>>>> You can use xdpdump to dump the packets from the XDP program before it >>>>>>> gets redirected into the XSK: >>>>>>> https://github.com/xdp-project/xdp-tools/tree/master/xdp-dump >>>>>> >>>>>> Wow, this is a good idea. >>>>> >>>>> Yes, it is rather cool (credit to Eelco). Notice the extra info you >>>>> can capture from 'exit', like XDP return codes, if_index, rx_queue. >>>>> >>>>> The tool uses the perf ring-buffer to send/copy data to userspace. >>>>> This is actually surprisingly fast, but I still think AF_XDP will be >>>>> faster (but it usually 'steals' the packet). >>>>> >>>>> Another (crazy?) idea is to extend this (and xdpdump), is to leverage >>>>> Hangbin's recent XDP_REDIRECT extension e624d4ed4aa8 ("xdp: Extend >>>>> xdp_redirect_map with broadcast support"). We now have a >>>>> xdp_redirect_map flag BPF_F_BROADCAST, what if we create a >>>>> BPF_F_CLONE_PASS flag? >>>>> >>>>> The semantic meaning of BPF_F_CLONE_PASS flag is to copy/clone the >>>>> packet for the specified map target index (e.g AF_XDP map), but >>>>> afterwards it does like veth/cpumap and creates an SKB from the >>>>> xdp_frame (see __xdp_build_skb_from_frame()) and send to netstack. >>>>> (Feel free to kick me if this doesn't make any sense) >>>> >>>> This would be a smooth way to implement clone support for AF_XDP. If >>>> we had this and someone added AF_XDP support to libpcap, we could both >>>> capture AF_XDP traffic with tcpdump (using this clone functionality in >>>> the XDP program) and speed up tcpdump for dumping traffic destined for >>>> regular sockets. Would that solve your use case Xuan? Note that I have >>>> not looked into the BPF_F_CLONE_PASS code, so do not know at this >>>> point what it would take to support this for XSKMAPs. >>> >>> Recently also ended up with something similar for our XDP LB to record pcaps [0] ;) >>> My question is.. tcpdump doesn't really care where the packet data comes from, >>> so why not extending libpcap's Linux-related internals to either capture from >>> perf RB or BPF ringbuf rather than AF_PACKET sockets? Cloning is slow, and if >>> you need to end up creating an skb which is then cloned once again inside AF_PACKET >>> it's even worse. Just relying and reading out, say, perf RB you don't need any >>> clones at all. >> >> We discussed this when creating xdpdump and decided to keep it as a >> separate tool for the time being. I forget the details of the >> discussion, maybe Eelco remembers. >> >> Anyway, xdpdump does have a "pipe pcap to stdout" feature so you can do >> `xdpdump | tcpdump` and get the interactive output; and it will also >> save pcap information to disk, of course (using pcap-ng so it can also >> save metadata like XDP program name and return code). > > Right, and this should yield a significantly better performance compared to > cloning & pushing traffic into AF_PACKET. I presume not many folks are aware > of xdpdump (yet) which is probably why such patch was created here.. What, are you implying we haven't achieved world domination yet? Inconceivable! ;) > a native libpcap implementation could solve that aspect fwiw and > additionally hook at the same points as AF_PACKET via BPF but without > the hassle/overhead of things like dev_queue_xmit_nit() in fast path. > (Maybe another option could be to have a drop-in replacement > libpcap.so for tcpdump using it transparently.) I do believe that Michael was open to adding something like this to tcpdump/libpcap when I last talked to him about it; and I'm certainly not opposed to it either! Hooking up tcpdump like this may be a bit of a firehose, though, so it would be nice to be able to carry over the kernel-side filtering as well. I suppose it should be possible to write an eBPF bytecode generator that does a bit of setup and then just translates the cBPF packet filtering ops, no? This would be cool to have in any case; IIRC Cloudflare did something like that but took a detour through C code generation? -Toke