Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes: > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking to avoid corruption if preemption occurs. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided and data corruption is also avoided. > > Create a struct bpf_xdp_storage which contains struct bpf_redirect_info. > Define the variable on stack, use xdp_storage_set() to set a pointer to > it in task_struct of the current task. Use the __free() annotation to > automatically reset the pointer once function returns. Use a pointer which can > be used by the __free() annotation to avoid invoking the callback the pointer > is NULL. This helps the compiler to optimize the code. > The xdp_storage_set() can nest. For instance local_bh_enable() in > bpf_test_run_xdp_live() may run NET_RX_SOFTIRQ/ net_rx_action() which > also uses xdp_storage_set(). Therefore only the first invocations > updates the per-task pointer. > Use xdp_storage_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. > > This is only done on PREEMPT_RT. The !PREEMPT_RT builds keep using the > per-CPU variable instead. This should also work for !PREEMPT_RT but > isn't needed. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> [...] > diff --git a/net/core/dev.c b/net/core/dev.c > index de362d5f26559..c3f7d2a6b6134 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3988,11 +3988,15 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, > struct net_device *orig_dev, bool *another) > { > struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress); > + struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL; > enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS; > + struct bpf_xdp_storage __xdp_store; > int sch_ret; > > if (!entry) > return skb; > + > + xdp_store = xdp_storage_set(&__xdp_store); > if (*pt_prev) { > *ret = deliver_skb(skb, *pt_prev, orig_dev); > *pt_prev = NULL; > @@ -4044,12 +4048,16 @@ static __always_inline struct sk_buff * > sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) > { > struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress); > + struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL; > enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS; > + struct bpf_xdp_storage __xdp_store; > int sch_ret; > > if (!entry) > return skb; > > + xdp_store = xdp_storage_set(&__xdp_store); > + > /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was > * already set by the caller. > */ These, and the LWT code, don't actually have anything to do with XDP, which indicates that the 'xdp_storage' name misleading. Maybe 'bpf_net_context' or something along those lines? Or maybe we could just move the flush lists into bpf_redirect_info itself and just keep that as the top-level name? -Toke