Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes: > Hi Paul, > > On 6/10/21 8:38 PM, Alexei Starovoitov wrote: >> On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >>> >>> XDP programs are called from a NAPI poll context, which means the RCU >>> reference liveness is ensured by local_bh_disable(). Add >>> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so >>> lockdep understands that the dereferences are safe from inside *either* an >>> rcu_read_lock() section *or* a local_bh_disable() section. This is done in >>> preparation for removing the redundant rcu_read_lock()s from the drivers. >>> >>> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> >>> --- >>> kernel/bpf/hashtab.c | 21 ++++++++++++++------- >>> kernel/bpf/helpers.c | 6 +++--- >>> kernel/bpf/lpm_trie.c | 6 ++++-- >>> 3 files changed, 21 insertions(+), 12 deletions(-) >>> >>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>> index 6f6681b07364..72c58cc516a3 100644 >>> --- a/kernel/bpf/hashtab.c >>> +++ b/kernel/bpf/hashtab.c >>> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key) >>> struct htab_elem *l; >>> u32 hash, key_size; >>> >>> - WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); >>> + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() && >>> + !rcu_read_lock_bh_held()); >> >> It's not clear to me whether rcu_read_lock_held() is still needed. >> All comments sound like rcu_read_lock_bh_held() is a superset of rcu >> that includes bh. >> But reading rcu source code it looks like RCU_BH is its own rcu flavor... >> which is confusing. > > The series is a bit confusing to me as well. I recall we had a discussion with > Paul, but it was back in 2016 aka very early days of XDP to get some clarifications > about RCU vs RCU-bh flavour on this. Paul, given the series in here, I assume the > below is not true anymore, and in this case (since we're removing rcu_read_lock() > from drivers), the RCU-bh acts as a real superset? > > Back then from your clarifications this was not the case: > > On Mon, Jul 25, 2016 at 11:26:02AM -0700, Alexei Starovoitov wrote: > > On Mon, Jul 25, 2016 at 11:03 AM, Paul E. McKenney > > <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > [...] > >>> The crux of the question is whether a particular driver rx handler, when > >>> called from __do_softirq, needs to add an additional rcu_read_lock or > >>> whether it can rely on the mechanics of softirq. > >> > >> If it was rcu_read_lock_bh(), you could. > >> > >> But you didn't say rcu_read_lock_bh(), you instead said rcu_read_lock(), > >> which means that you absolutely cannot rely on softirq semantics. > >> > >> In particular, in CONFIG_PREEMPT=y kernels, rcu_preempt_check_callbacks() > >> will notice that there is no rcu_read_lock() in effect and report > >> a quiescent state for that CPU. Because rcu_preempt_check_callbacks() > >> is invoked from the scheduling-clock interrupt, it absolutely can > >> execute during do_softirq(), and therefore being in softirq context > >> in no way provides rcu_read_lock()-style protection. > >> > >> Now, Alexei's question was for CONFIG_PREEMPT=n kernels. However, in > >> that case, rcu_read_lock() and rcu_read_unlock() generate no code > >> in recent production kernels, so there is no performance penalty for > >> using them. (In older kernels, they implied a barrier().) > >> > >> So either way, with or without CONFIG_PREEMPT, you should use > >> rcu_read_lock() to get RCU protection. > >> > >> One alternative might be to switch to rcu_read_lock_bh(), but that > >> will add local_disable_bh() overhead to your read paths. > >> > >> Does that help, or am I missing the point of the question? > > > > thanks a lot for explanation. > > Glad you liked it! > > > I mistakenly assumed that _bh variants are 'stronger' and > > act as inclusive, but sounds like they're completely orthogonal > > especially with preempt_rcu=y. > > Yes, they are pretty much orthogonal. > > > With preempt_rcu=n and preempt=y, it would be the case, since > > bh disables preemption and rcu_read_lock does the same as well, > > right? Of course, the code shouldn't be relying on that, so we > > have to fix our stuff. > > Indeed, especially given that the kernel currently won't allow you > to configure CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT=y. If it does, > please let me know, as that would be a bug that needs to be fixed. > (For one thing, I do not test that combination.) > > Thanx, Paul > > And now, fast-forward again to 2021 ... :) We covered this in the thread I linked from the cover letter. Specifically, this seems to have been a change from v4.20, see Paul's reply here: https://lore.kernel.org/bpf/20210417002301.GO4212@paulmck-ThinkPad-P17-Gen-1/ and the follow-up covering -rt here: https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/ -Toke