On Thu, Jul 6, 2023 at 8:39 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 7/7/2023 10:12 AM, Alexei Starovoitov wrote: > > On Thu, Jul 6, 2023 at 7:07 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> Hi, > >> > >> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote: > >>> From: Alexei Starovoitov <ast@xxxxxxxxxx> > >>> > >>> alloc_bulk() can reuse elements from free_by_rcu_ttrace. > >>> Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc(). > >>> > >>> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > >>> --- > >>> kernel/bpf/memalloc.c | 16 ++++++++++------ > >>> 1 file changed, 10 insertions(+), 6 deletions(-) > >>> > >>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > >>> index 9986c6b7df4d..e5a87f6cf2cc 100644 > >>> --- a/kernel/bpf/memalloc.c > >>> +++ b/kernel/bpf/memalloc.c > >>> @@ -212,6 +212,15 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) > >>> if (i >= cnt) > >>> return; > >>> > >>> + for (; i < cnt; i++) { > >>> + obj = llist_del_first(&c->waiting_for_gp_ttrace); > >>> + if (!obj) > >>> + break; > >>> + add_obj_to_free_list(c, obj); > >>> + } > >>> + if (i >= cnt) > >>> + return; > >> I still think using llist_del_first() here is not safe as reported in > >> [1]. Not sure whether or not invoking enque_to_free() firstly for > >> free_llist_extra will close the race completely. Will check later. > > lol. see my reply a second ago in the other thread. > > > > and it's not just waiting_for_gp_ttrace. free_by_rcu_ttrace is similar. > > I think free_by_rcu_ttrace is different, because the reuse is only > possible after one tasks trace RCU grace period as shown below, and the > concurrent llist_del_first() must have been completed when the head is > reused and re-added into free_by_rcu_ttrace again. > > // c0->free_by_rcu_ttrace > A -> B -> C -> nil > > P1: > alloc_bulk() > llist_del_first(&c->free_by_rcu_ttrace) > entry = A > next = B > > P2: > do_call_rcu_ttrace() > // c->free_by_rcu_ttrace->first = NULL > llist_del_all(&c->free_by_rcu_ttrace) > move to c->waiting_for_gp_ttrace > > P1: > llist_del_first() > return NULL > > // A is only reusable after one task trace RCU grace > // llist_del_first() must have been completed "must have been completed" ? I guess you're assuming that alloc_bulk() from irq_work is running within rcu_tasks_trace critical section, so __free_rcu_tasks_trace() callback will execute after irq work completed? I don't think that's the case. In vCPU P1 is stopped for looong time by host, P2 can execute __free_rcu_tasks_trace (or P3, since tasks trace callbacks execute in a kthread that is not bound to any cpu). __free_rcu_tasks_trace() will free it into slab. Then kmalloc the same obj and eventually put it back into free_by_rcu_ttrace. Since you believe that waiting_for_gp_ttrace ABA is possible here it's the same probability. imo both lower than a bit flip due to cosmic rays which is actually observable in practice. > __free_rcu_tasks_trace > free_all(llist_del_all(&c->waiting_for_gp_ttrace)) > >