Hi, On 6/22/2023 11:23 AM, Alexei Starovoitov wrote: > On Wed, Jun 21, 2023 at 08:26:30PM +0800, Hou Tao wrote: >>> SNIP >>> rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); >>> + rcu_in_progress += atomic_read(&c->call_rcu_in_progress); >> I got a oops in rcu_tasks_invoke_cbs() during stressing test and it >> seems we should do atomic_read(&call_rcu_in_progress) first, then do >> atomic_read(&call_rcu_ttrace_in_progress) to fix the problem. And to > yes. it's a race. As you find out yourself changing the order won't fix it. > >> The introduction of c->tgt make the destroy procedure more complicated. >> Even with the proposed fix above, there is still oops in >> rcu_tasks_invoke_cbs() and I think it happens as follows: > Right. That's another issue. > > Please send bench htab test and your special stress test, > so we can have a common ground to reason about. > Also please share your bench htab numbers before/after. Will send htab-mem benchmark next week and update the benchmark results accordingly. There is no peculiarity in my local stress test. I just hacked htab to use bpf_mem_cache_free_rcu() and ran multiple tests_maps, map_perf_test and htab_mem benchmark simultaneously. > > I'm thinking to fix the races in the following way. > Could you please test it with your stress test? > The idea is to set 'draining' first everywhere that it will make all rcu > callbacks a nop. > Then drain all link lists. At this point nothing races with them. > And then wait for single rcu_barrier_tasks_trace() that will make sure > all callbcaks done. At this point the only thing they will do is > if (c->draining) goto out; > The barriers are needed to make 'c' access not uaf. As I replied in the v2, I don't think the lockless checking of c->draining will work. And I will test it anyway. > > ... > > >From e20782160166d4327c76b57af160c4973396e0d0 Mon Sep 17 00:00:00 2001 > From: Alexei Starovoitov <ast@xxxxxxxxxx> > Date: Wed, 21 Jun 2023 20:11:33 -0700 > Subject: [PATCH bpf-next] bpf: Fix races. > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > --- > kernel/bpf/memalloc.c | 25 +++++++++++++++++++++---- > 1 file changed, 21 insertions(+), 4 deletions(-) > > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > index 4d1002e7b4b5..75c553b15deb 100644 > --- a/kernel/bpf/memalloc.c > +++ b/kernel/bpf/memalloc.c > @@ -99,6 +99,7 @@ struct bpf_mem_cache { > int low_watermark, high_watermark, batch; > int percpu_size; > struct bpf_mem_cache *tgt; > + bool draining; > > /* list of objects to be freed after RCU GP */ > struct llist_head free_by_rcu; > @@ -264,7 +265,10 @@ static void __free_rcu(struct rcu_head *head) > { > struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace); > > + if (unlikely(c->draining)) > + goto out; > free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); > +out: > atomic_set(&c->call_rcu_ttrace_in_progress, 0); > } > > @@ -353,8 +357,11 @@ static void __free_by_rcu(struct rcu_head *head) > { > struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); > struct bpf_mem_cache *tgt = c->tgt; > - struct llist_node *llnode = llist_del_all(&c->waiting_for_gp); > + struct llist_node *llnode; > > + if (unlikely(c->draining)) > + goto out; > + llnode = llist_del_all(&c->waiting_for_gp); > if (!llnode) > goto out; > > @@ -568,10 +575,9 @@ static void free_mem_alloc(struct bpf_mem_alloc *ma) > * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by > * using rcu_trace_implies_rcu_gp() as well. > */ > - rcu_barrier(); /* wait for __free_by_rcu() */ > - rcu_barrier_tasks_trace(); /* wait for __free_rcu() via call_rcu_tasks_trace */ > + rcu_barrier_tasks_trace(); > if (!rcu_trace_implies_rcu_gp()) > - rcu_barrier(); /* wait for __free_rcu() via call_rcu */ > + rcu_barrier(); > free_mem_alloc_no_barrier(ma); > } > > @@ -616,6 +622,10 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) > > if (ma->cache) { > rcu_in_progress = 0; > + for_each_possible_cpu(cpu) { > + c = per_cpu_ptr(ma->cache, cpu); > + c->draining = true; > + } > for_each_possible_cpu(cpu) { > c = per_cpu_ptr(ma->cache, cpu); > /* > @@ -639,6 +649,13 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) > } > if (ma->caches) { > rcu_in_progress = 0; > + for_each_possible_cpu(cpu) { > + cc = per_cpu_ptr(ma->caches, cpu); > + for (i = 0; i < NUM_CACHES; i++) { > + c = &cc->cache[i]; > + c->draining = true; > + } > + } > for_each_possible_cpu(cpu) { > cc = per_cpu_ptr(ma->caches, cpu); > for (i = 0; i < NUM_CACHES; i++) {