Re: [RFC PATCH bpf-next] xdp: Add tracepoint on XDP program return

Ido Schimmel <idosch@xxxxxxxxxx> · Mon, 23 Dec 2019 11:25:20 +0200

On Tue, Dec 17, 2019 at 09:52:02AM +0100, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:
> 
> > On Mon, Dec 16, 2019 at 07:17:59PM +0100, Björn Töpel wrote:
> >> On Mon, 16 Dec 2019 at 16:28, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >> >
> >> > This adds a new tracepoint, xdp_prog_return, which is triggered at every
> >> > XDP program return. This was first discussed back in August[0] as a way to
> >> > hook XDP into the kernel drop_monitor framework, to have a one-stop place
> >> > to find all packet drops in the system.
> >> >
> >> > Because trace/events/xdp.h includes filter.h, some ifdef guarding is needed
> >> > to be able to use the tracepoint from bpf_prog_run_xdp(). If anyone has any
> >> > ideas for how to improve on this, please to speak up. Sending this RFC
> >> > because of this issue, and to get some feedback from Ido on whether this
> >> > tracepoint has enough data for drop_monitor usage.
> >> >
> >> 
> >> I get that it would be useful, but can it be solved with BPF tracing
> >> (i.e. tracing BPF with BPF)? It would be neat not adding another
> >> tracepoint in the fast-path...
> >
> > That was my question as well.
> > Here is an example from Eelco:
> > https://lore.kernel.org/bpf/78D7857B-82E4-42BC-85E1-E3D7C97BF840@xxxxxxxxxx/
> > BPF_TRACE_2("fexit/xdp_prog_simple", trace_on_exit,
> >              struct xdp_buff*, xdp, int, ret)
> > {
> >      bpf_debug("fexit: [ifindex = %u, queue =  %u, ret = %d]\n",
> >                xdp->rxq->dev->ifindex, xdp->rxq->queue_index, ret);
> >
> >      return 0;
> > }
> > 'ret' is return code from xdp program.
> > Such approach is per xdp program, but cheaper when not enabled
> > and faster when it's triggering comparing to static tracepoint.
> > Anything missing there that you'd like to see?
> 
> For userspace, sure, the fentry/fexit stuff is fine. The main use case
> for this new tracepoint is to hook into the (in-kernel) drop monitor.
> Dunno if that can be convinced to hook into the BPF tracing
> infrastructure instead of tracepoints. Ido, WDYT?

Hi Toke,

Sorry for the delay. I wasn't available most of last week.

Regarding the tracepoint, the data it provides seems sufficient to me.
Regarding the fentry/fexit stuff, it would be great to hook it into drop
monitor, but I'm not sure how to do that at this point. It seems that at
minimum user would need to pass the XDP programs that need to be traced?

FYI, I'm not too happy with the current way of capturing the events via
nlmon, so I started creating a utility to directly output the events to
pcap [1] (inspired by Florian's nfqdump). Will send a pull request to
Neil when it's ready. You can do:

# dwdump -w /dev/stdout | tshark -V -r -

A recent enough wireshark will correctly dissect these events. My next
step is to add '--unique' which will load an eBPF program on the socket
and only allow unique events to be enqueued. The program will store
{5-tuple, IP/drop reason} in LRU hash with corresponding count. I can
then instrument the application for Prometheus so that it will export
the contents of the map as metrics.

Please let me know if you have more suggestions.

[1] https://github.com/idosch/dropwatch/blob/dwdump/src/dwdump.c