Re: [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 6, 2024 at 1:11 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 9/4/24 3:21 PM, Martin KaFai Lau wrote:
> > On 8/28/24 4:24 PM, Alexei Starovoitov wrote:
> >>> @@ -714,6 +869,11 @@ void bpf_obj_free_fields(const struct btf_record *rec,
> >>> void *obj)
> >>>                                  field->kptr.dtor(xchgd_field);
> >>>                          }
> >>>                          break;
> >>> +               case BPF_UPTR:
> >>> +                       if (*(void **)field_ptr)
> >>> +                               bpf_obj_unpin_uptr(field, *(void **)field_ptr);
> >>> +                       *(void **)field_ptr = NULL;
> >> This one will be called from
> >>   task_storage_delete->bpf_selem_free->bpf_obj_free_fields
> >>
> >> and even if upin was safe to do from that context
> >> we cannot just do:
> >> *(void **)field_ptr = NULL;
> >>
> >> since bpf prog might be running in parallel,
> >> it could have just read that addr and now is using it.
> >>
> >> The first thought of a way to fix this was to split
> >> bpf_obj_free_fields() into the current one plus
> >> bpf_obj_free_fields_after_gp()
> >> that will do the above unpin bit.
> >> and call the later one from bpf_selem_free_rcu()
> >> while bpf_obj_free_fields() from bpf_selem_free()
> >> will not touch uptr.
> >>
> >> But after digging further I realized that task_storage
> >> already switched to use bpf_ma, so the above won't work.
> >>
> >> So we need something similar to BPF_KPTR_REF logic:
> >> xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
> >> and then delay of uptr unpin for that address into call_rcu.
> >>
> >> Any better ideas?
> >
>
> I think the existing reuse_now arg in the bpf_selem_free can be used. reuse_now
> (renamed from the earlier use_trace_rcu) was added to avoid call_rcu_tasks_trace
> for the common case.
>
> selem (in type "struct bpf_local_storage_elem") is the one exposed to the bpf prog.
>
> bpf_selem_free knows whether a selem can be reused immediately based on the
> caller. It is currently flagged in the reuse_now arg: "bpf_selem_free(...., bool
> reuse_now)".
>
> If a selem cannot be reuse_now (i.e. == false), it is currently going through
> "call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu)". We can do
> unpin_user_page() in the rcu callback.
>
> A selem can be reuse_now (i.e. == true) if the selem is no longer needed because
> either its owner (i.e. the task_struct here) is going away in free_task() or the
> bpf map is being destructed in bpf_local_storage_map_free(). No bpf prog should
> have a hold on the selem at this point. I think for these two cases, the
> unpin_user_page() can be directly called in bpf_selem_free().

but there is also this path:
bpf_task_storage_delete -> task_storage_delete -> bpf_selem_free
 -> bpf_obj_free_fields

In this case bpf prog may still be looking at uptr address
and we cannot do unpin right away in bpf_obj_free_fields.
All other special fields in map value are ok,
since they are either relying on bpf_mem_alloc and
have rcu/rcu_tasks_trace gp
or extra indirection like timer/wq.

> One existing bug is, from looking at patch 6, I don't think the free_task() case
> can be "reuse_now == true" anymore because of the bpf_task_release kfunc did not
> mark the previously obtained task_storage to be invalid:
>
> data_task = bpf_task_from_pid(parent_pid);
> ptr = bpf_task_storage_get(&datamap, data_task, 0, ...);
> bpf_task_release(data_task);
> if (!ptr)
>         return 0;
> /* The prog still holds a valid task storage ptr. */
> udata = ptr->udata;
>
> It can be fixed by marking the ref_obj_id of the "ptr". Although it is more
> correct to make the task storage "ptr" invalid after task_release, it may break
> the existing progs.

Are you suggesting that bpf_task_release should invalidate all pointers
fetched from map value?
That will work, but it's not an issue for other special fields in there
like kptr.
So this invalidation would be need only for uptr which feels
weird to special case it and probably will be confusing to users writing
such programs.
Above bpf prog example should be ok to use.
We only need to delay unpin after rcu/rcu_task_trace gp.
Hence my proposal in bpf_obj_free_fields() do:
 case UPTR:
   xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
   call_rcu(...) to unpin.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux