On Mon, May 6, 2024 at 12:41 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes: > > > The XDP redirect process is two staged: > > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > > packet and makes decisions. While doing that, the per-CPU variable > > bpf_redirect_info is used. > > > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > > and it may also access other per-CPU variables like xskmap_flush_list. > > > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > > does not access bpf_redirect_info but will touch the individual per-CPU > > lists. > > > > The per-CPU variables are only used in the NAPI callback hence disabling > > bottom halves is the only protection mechanism. Users from preemptible > > context (like cpu_map_kthread_run()) explicitly disable bottom halves > > for protections reasons. > > Without locking in local_bh_disable() on PREEMPT_RT this data structure > > requires explicit locking. > > > > PREEMPT_RT has forced-threaded interrupts enabled and every > > NAPI-callback runs in a thread. If each thread has its own data > > structure then locking can be avoided. > > > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > > it. Use the __free() annotation to automatically reset the pointer once > > function returns. > > The bpf_net_ctx_set() may nest. For instance a function can be used from > > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > > updates the pointer. > > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > > bpf_redirect_info. > > > > On PREEMPT_RT the pointer to bpf_net_context is saved task's > > task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU > > variable (which is always NODE-local memory). Using always the > > bpf_net_context approach has the advantage that there is almost zero > > differences between PREEMPT_RT and non-PREEMPT_RT builds. > > Did you ever manage to get any performance data to see if this has an > impact? > > [...] > > > +static inline struct bpf_net_context *bpf_net_ctx_get(void) > > +{ > > + struct bpf_net_context *bpf_net_ctx = this_cpu_read(bpf_net_context); > > + > > + WARN_ON_ONCE(!bpf_net_ctx); > > If we have this WARN... > > > +static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void) > > +{ > > + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); > > + > > + if (!bpf_net_ctx) > > + return NULL; > > ... do we really need all the NULL checks? Indeed. Let's drop all NULL checks, since they definitely add overhead. I'd also remove ifdef CONFIG_PREEMPT_RT and converge on single implementation: static inline struct bpf_net_context * bpf_net_ctx_get(void) { return current->bpf_net_context; }