On Fri, 2023-12-15 at 18:07 +0100, Sebastian Andrzej Siewior wrote: > The access to seg6_bpf_srh_states is protected by disabling preemption. > Based on the code, the entry point is input_action_end_bpf() and > every other function (the bpf helper functions bpf_lwt_seg6_*()), that > is accessing seg6_bpf_srh_states, should be called from within > input_action_end_bpf(). > > input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of > the function and then disables preemption. This looks wrong because if > preemption needs to be disabled as part of the locking mechanism then > the variable shouldn't be accessed beforehand. > > Looking at how it is used via test_lwt_seg6local.sh then > input_action_end_bpf() is always invoked from softirq context. If this > is always the case then the preempt_disable() statement is superfluous. > If this is not always invoked from softirq then disabling only > preemption is not sufficient. > > Replace the preempt_disable() statement with nested-BH locking. This is > not an equivalent replacement as it assumes that the invocation of > input_action_end_bpf() always occurs in softirq context and thus the > preempt_disable() is superfluous. > Add a local_lock_t the data structure and use local_lock_nested_bh() in > guard notation for locking. Add lockdep_assert_held() to ensure the lock > is held while the per-CPU variable is referenced in the helper functions. > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > Cc: Andrii Nakryiko <andrii@xxxxxxxxxx> > Cc: David Ahern <dsahern@xxxxxxxxxx> > Cc: Hao Luo <haoluo@xxxxxxxxxx> > Cc: Jiri Olsa <jolsa@xxxxxxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > Cc: KP Singh <kpsingh@xxxxxxxxxx> > Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx> > Cc: Song Liu <song@xxxxxxxxxx> > Cc: Stanislav Fomichev <sdf@xxxxxxxxxx> > Cc: Yonghong Song <yonghong.song@xxxxxxxxx> > Cc: bpf@xxxxxxxxxxxxxxx > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > --- > include/net/seg6_local.h | 1 + > net/core/filter.c | 3 ++ > net/ipv6/seg6_local.c | 59 ++++++++++++++++++++++------------------ > 3 files changed, 36 insertions(+), 27 deletions(-) > > diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h > index 3fab9dec2ec45..0f22771359f4c 100644 > --- a/include/net/seg6_local.h > +++ b/include/net/seg6_local.h > @@ -20,6 +20,7 @@ extern bool seg6_bpf_has_valid_srh(struct sk_buff *skb); > > struct seg6_bpf_srh_state { > struct ipv6_sr_hdr *srh; > + local_lock_t bh_lock; > u16 hdrlen; > bool valid; > }; > diff --git a/net/core/filter.c b/net/core/filter.c > index 1737884be52f8..c8013f762524b 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -6384,6 +6384,7 @@ BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset, > void *srh_tlvs, *srh_end, *ptr; > int srhoff = 0; > > + lockdep_assert_held(&srh_state->bh_lock); > if (srh == NULL) > return -EINVAL; > > @@ -6440,6 +6441,7 @@ BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb, > int hdroff = 0; > int err; > > + lockdep_assert_held(&srh_state->bh_lock); > switch (action) { > case SEG6_LOCAL_ACTION_END_X: > if (!seg6_bpf_has_valid_srh(skb)) > @@ -6516,6 +6518,7 @@ BPF_CALL_3(bpf_lwt_seg6_adjust_srh, struct sk_buff *, skb, u32, offset, > int srhoff = 0; > int ret; > > + lockdep_assert_held(&srh_state->bh_lock); > if (unlikely(srh == NULL)) > return -EINVAL; > > diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c > index 24e2b4b494cb0..ed7278af321a2 100644 > --- a/net/ipv6/seg6_local.c > +++ b/net/ipv6/seg6_local.c > @@ -1380,7 +1380,9 @@ static int input_action_end_b6_encap(struct sk_buff *skb, > return err; > } > > -DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states); > +DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states) = { > + .bh_lock = INIT_LOCAL_LOCK(bh_lock), > +}; > > bool seg6_bpf_has_valid_srh(struct sk_buff *skb) > { > @@ -1388,6 +1390,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb) > this_cpu_ptr(&seg6_bpf_srh_states); > struct ipv6_sr_hdr *srh = srh_state->srh; > > + lockdep_assert_held(&srh_state->bh_lock); > if (unlikely(srh == NULL)) > return false; > > @@ -1408,8 +1411,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb) > static int input_action_end_bpf(struct sk_buff *skb, > struct seg6_local_lwt *slwt) > { > - struct seg6_bpf_srh_state *srh_state = > - this_cpu_ptr(&seg6_bpf_srh_states); > + struct seg6_bpf_srh_state *srh_state; > struct ipv6_sr_hdr *srh; > int ret; > > @@ -1420,41 +1422,44 @@ static int input_action_end_bpf(struct sk_buff *skb, > } > advance_nextseg(srh, &ipv6_hdr(skb)->daddr); > > - /* preempt_disable is needed to protect the per-CPU buffer srh_state, > - * which is also accessed by the bpf_lwt_seg6_* helpers > + /* The access to the per-CPU buffer srh_state is protected by running > + * always in softirq context (with disabled BH). On PREEMPT_RT the > + * required locking is provided by the following local_lock_nested_bh() > + * statement. It is also accessed by the bpf_lwt_seg6_* helpers via > + * bpf_prog_run_save_cb(). > */ > - preempt_disable(); > - srh_state->srh = srh; > - srh_state->hdrlen = srh->hdrlen << 3; > - srh_state->valid = true; > + scoped_guard(local_lock_nested_bh, &seg6_bpf_srh_states.bh_lock) { > + srh_state = this_cpu_ptr(&seg6_bpf_srh_states); > + srh_state->srh = srh; > + srh_state->hdrlen = srh->hdrlen << 3; > + srh_state->valid = true; Here the 'scoped_guard' usage adds a lot of noise to the patch, due to the added indentation. What about using directly local_lock_nested_bh()/local_unlock_nested_bh() ? Cheers, Paolo