On Mon, Feb 24, 2025 at 01:00:01PM +0100, Ricardo Cañuelo Navarro wrote: > From: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > [ Upstream commit 401cb7dae8130fd34eb84648e02ab4c506df7d5e ] > > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided. > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > it, bpf_net_ctx_clear() removes it again. > The bpf_net_ctx_set() may nest. For instance a function can be used from > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > updates the pointer. > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. The returned data structure is zero initialized to > ensure nothing is leaked from stack. This is done on first usage of the > struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to > note that initialisation is required. First invocation of > bpf_net_ctx_get_ri() will memset() the data structure and update > bpf_redirect_info::kern_flags. > bpf_redirect_info::nh is excluded from memset because it is only used > once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is > moved past nh to exclude it from memset. > > The pointer to bpf_net_context is saved task's task_struct. Using > always the bpf_net_context approach has the advantage that there is > almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. > > Cc: Andrii Nakryiko <andrii@xxxxxxxxxx> > Cc: Eduard Zingerman <eddyz87@xxxxxxxxx> > Cc: Hao Luo <haoluo@xxxxxxxxxx> > Cc: Jiri Olsa <jolsa@xxxxxxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > Cc: KP Singh <kpsingh@xxxxxxxxxx> > Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx> > Cc: Song Liu <song@xxxxxxxxxx> > Cc: Stanislav Fomichev <sdf@xxxxxxxxxx> > Cc: Yonghong Song <yonghong.song@xxxxxxxxx> > Acked-by: Alexei Starovoitov <ast@xxxxxxxxxx> > Acked-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> > Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@xxxxxxxxxxxxx > Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx> > --- > include/linux/filter.h | 56 +++++++++++++++++++++++++++++++++++++++++--------- > include/linux/sched.h | 3 +++ > kernel/bpf/cpumap.c | 3 +++ > kernel/bpf/devmap.c | 9 +++++++- > kernel/fork.c | 1 + > net/bpf/test_run.c | 11 +++++++++- > net/core/dev.c | 28 ++++++++++++++++++++++++- > net/core/filter.c | 41 +++++++++++------------------------- > net/core/lwt_bpf.c | 3 +++ > 9 files changed, 113 insertions(+), 42 deletions(-) You did major changes to this and you didn't sign off and explain what you did differently from the original commit? You know we can't take that... thanks, greg k-h