Re: [PATCH v4 bpf-next 09/14] bpf: Allow reuse from waiting_for_gp_ttrace list.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 6, 2023 at 8:39 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On 7/7/2023 10:12 AM, Alexei Starovoitov wrote:
> > On Thu, Jul 6, 2023 at 7:07 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >> Hi,
> >>
> >> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote:
> >>> From: Alexei Starovoitov <ast@xxxxxxxxxx>
> >>>
> >>> alloc_bulk() can reuse elements from free_by_rcu_ttrace.
> >>> Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc().
> >>>
> >>> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> >>> ---
> >>>  kernel/bpf/memalloc.c | 16 ++++++++++------
> >>>  1 file changed, 10 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> >>> index 9986c6b7df4d..e5a87f6cf2cc 100644
> >>> --- a/kernel/bpf/memalloc.c
> >>> +++ b/kernel/bpf/memalloc.c
> >>> @@ -212,6 +212,15 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
> >>>       if (i >= cnt)
> >>>               return;
> >>>
> >>> +     for (; i < cnt; i++) {
> >>> +             obj = llist_del_first(&c->waiting_for_gp_ttrace);
> >>> +             if (!obj)
> >>> +                     break;
> >>> +             add_obj_to_free_list(c, obj);
> >>> +     }
> >>> +     if (i >= cnt)
> >>> +             return;
> >> I still think using llist_del_first() here is not safe as reported in
> >> [1]. Not sure whether or not invoking enque_to_free() firstly for
> >> free_llist_extra will close the race completely. Will check later.
> > lol. see my reply a second ago in the other thread.
> >
> > and it's not just waiting_for_gp_ttrace. free_by_rcu_ttrace is similar.
>
> I think free_by_rcu_ttrace is different, because the reuse is only
> possible after one tasks trace RCU grace period as shown below, and the
> concurrent llist_del_first() must have been completed when the head is
> reused and re-added into free_by_rcu_ttrace again.
>
> // c0->free_by_rcu_ttrace
> A -> B -> C -> nil
>
> P1:
> alloc_bulk()
>     llist_del_first(&c->free_by_rcu_ttrace)
>         entry = A
>         next = B
>
> P2:
> do_call_rcu_ttrace()
>     // c->free_by_rcu_ttrace->first = NULL
>     llist_del_all(&c->free_by_rcu_ttrace)
>         move to c->waiting_for_gp_ttrace
>
> P1:
> llist_del_first()
>     return NULL
>
> // A is only reusable after one task trace RCU grace
> // llist_del_first() must have been completed

"must have been completed" ?

I guess you're assuming that alloc_bulk() from irq_work
is running within rcu_tasks_trace critical section,
so __free_rcu_tasks_trace() callback will execute after
irq work completed?
I don't think that's the case.
In vCPU P1 is stopped for looong time by host,
P2 can execute __free_rcu_tasks_trace (or P3, since
tasks trace callbacks execute in a kthread that is not bound
to any cpu).
__free_rcu_tasks_trace() will free it into slab.
Then kmalloc the same obj and eventually put it back into
free_by_rcu_ttrace.

Since you believe that waiting_for_gp_ttrace ABA is possible
here it's the same probability. imo both lower than a bit flip due
to cosmic rays which is actually observable in practice.

> __free_rcu_tasks_trace
>     free_all(llist_del_all(&c->waiting_for_gp_ttrace))
>
>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux