Re: [PATCH v2 bpf-next 1/2] xdp: Add tracepoint for bulk XDP_TX

Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> · Fri, 7 Jun 2019 11:22:00 +0900

On 2019/06/07 4:41, Jesper Dangaard Brouer wrote:

On Thu, 6 Jun 2019 20:04:20 +0900
Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> wrote:


On 2019/06/05 16:59, Jesper Dangaard Brouer wrote:

On Wed,  5 Jun 2019 14:36:12 +0900
Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> wrote:

   

This is introduced for admins to check what is happening on XDP_TX when
bulk XDP_TX is in use, which will be first introduced in veth in next
commit.


Is the plan that this tracepoint 'xdp:xdp_bulk_tx' should be used by
all drivers?


I guess you mean all drivers that implement similar mechanism should use
this? Then yes.
(I don't think all drivers needs bulk tx mechanism though)


(more below)

   

Signed-off-by: Toshiaki Makita <toshiaki.makita1@xxxxxxxxx>
---
   include/trace/events/xdp.h | 25 +++++++++++++++++++++++++
   kernel/bpf/core.c          |  1 +
   2 files changed, 26 insertions(+)

diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index e95cb86..e06ea65 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -50,6 +50,31 @@
   		  __entry->ifindex)
   );

   

+TRACE_EVENT(xdp_bulk_tx,

+
+	TP_PROTO(const struct net_device *dev,
+		 int sent, int drops, int err),
+
+	TP_ARGS(dev, sent, drops, err),
+
+	TP_STRUCT__entry(


All other tracepoints in this file starts with:

		__field(int, prog_id)
		__field(u32, act)
or
		__field(int, map_id)
		__field(u32, act)

Could you please add those?


So... prog_id is the problem. The program can be changed while we are
enqueueing packets to the bulk queue, so the prog_id at flush may be an
unexpected one.


Hmmm... that sounds problematic, if the XDP bpf_prog for veth can
change underneath, before the flush.  Our redirect system, depend on
things being stable until the xdp_do_flush_map() operation, as will
e.g. set per-CPU (bpf_redirect_info) map_to_flush pointer (which depend
on XDP prog), and expect it to be correct/valid.


Sorry, I don't get how maps depend on programs.

At least xdp_do_redirect_map() handles map_to_flush change during NAPI. 

Is there a problem when the map is not changed but the program is changed?

Also I believe this is not veth-specific behavior. Looking at tun and 

i40e, they seem to change xdp_prog without stopping data path.



It can be fixed by disabling NAPI when changing XDP programs. This stops
packet processing while changing XDP programs, but I guess it is an
acceptable compromise. Having said that, I'm honestly not so eager to
make this change, since this will require refurbishment of one of the
most delicate part of veth XDP, NAPI disabling/enabling mechanism.

WDYT?


Sound like a bug, if XDP bpf_prog is not stable within the NAPI poll...


  

+		__field(int, ifindex)
+		__field(int, drops)
+		__field(int, sent)
+		__field(int, err)
+	),


The reason is that this make is easier to attach to multiple
tracepoints, and extract the same value.

Example with bpftrace oneliner:

$ sudo bpftrace -e 'tracepoint:xdp:xdp_* { @action[args->act] = count(); }'