Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes: > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided. > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > it. Use the __free() annotation to automatically reset the pointer once > function returns. > The bpf_net_ctx_set() may nest. For instance a function can be used from > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > updates the pointer. > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. > > On PREEMPT_RT the pointer to bpf_net_context is saved task's > task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU > variable (which is always NODE-local memory). Using always the > bpf_net_context approach has the advantage that there is almost zero > differences between PREEMPT_RT and non-PREEMPT_RT builds. Did you ever manage to get any performance data to see if this has an impact? [...] > +static inline struct bpf_net_context *bpf_net_ctx_get(void) > +{ > + struct bpf_net_context *bpf_net_ctx = this_cpu_read(bpf_net_context); > + > + WARN_ON_ONCE(!bpf_net_ctx); If we have this WARN... > +static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void) > +{ > + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); > + > + if (!bpf_net_ctx) > + return NULL; ... do we really need all the NULL checks? (not just here, but in the code below as well). I'm a little concerned that we are introducing a bunch of new branches in the XDP hot path. Which is also why I'm asking for benchmarks :) [...] > + /* ri->map is assigned in __bpf_xdp_redirect_map() from within a eBPF > + * program/ during NAPI callback. It is used during > + * xdp_do_generic_redirect_map()/ __xdp_do_redirect_frame() from the > + * redirect callback afterwards. ri->map is cleared after usage. > + * The path has no explicit RCU read section but the local_bh_disable() > + * is also a RCU read section which makes the complete softirq callback > + * RCU protected. This in turn makes ri->map RCU protocted and it is s/protocted/protected/ > + * sufficient to wait a grace period to ensure that no "ri->map == map" > + * exist. dev_map_free() removes the map from the list and then s/exist/exists/ -Toke