On Wed, May 29, 2024 at 9:29 AM Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided. > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > it. Use the __free() annotation to automatically reset the pointer once > function returns. > The bpf_net_ctx_set() may nest. For instance a function can be used from > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > updates the pointer. > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. > > On PREEMPT_RT the pointer to bpf_net_context is saved task's > task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU > variable (which is always NODE-local memory). Using always the > bpf_net_context approach has the advantage that there is almost zero > differences between PREEMPT_RT and non-PREEMPT_RT builds. > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > Cc: Andrii Nakryiko <andrii@xxxxxxxxxx> > Cc: Eduard Zingerman <eddyz87@xxxxxxxxx> > Cc: Hao Luo <haoluo@xxxxxxxxxx> > Cc: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> > Cc: Jiri Olsa <jolsa@xxxxxxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > Cc: KP Singh <kpsingh@xxxxxxxxxx> > Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx> > Cc: Song Liu <song@xxxxxxxxxx> > Cc: Stanislav Fomichev <sdf@xxxxxxxxxx> > Cc: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > Cc: Yonghong Song <yonghong.song@xxxxxxxxx> > Cc: bpf@xxxxxxxxxxxxxxx > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> lgtm Acked-by: Alexei Starovoitov <ast@xxxxxxxxxx>