Re: [PATCH bpf-next v1] bpf: Enable non-atomic allocations in local storage

Martin KaFai Lau <kafai@xxxxxx> · Tue, 15 Mar 2022 14:01:42 -0700



On Tue, Mar 15, 2022 at 01:33:05PM -0700, Joanne Koong wrote:
> > > > > > > @@ -425,25 +438,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > > > > >         if (old_sdata && (map_flags & BPF_F_LOCK)) {
> > > > > > >                 copy_map_value_locked(&smap->map, old_sdata->data, value,
> > > > > > >                                       false);
> > > > > > > -               selem = SELEM(old_sdata);
> > > > > > > -               goto unlock;
> > > > > > > +
> > > > > > > +               raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > > > > > > +
> > > > > > > +               if (!charge_err)
> > > > > > > +                       mem_uncharge(smap, owner, smap->elem_size);
> > > > > > > +               kfree(selem);
> > > > > > > +
> > > > > > > +               return old_sdata;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       if (!old_sdata && charge_err) {
> > > > > > > +               /* If there is no existing local storage value, then this means
> > > > > > > +                * we needed the charge to succeed. We must make sure this did not
> > > > > > > +                * return an error.
> > > > > > > +                *
> > > > > > > +                * Please note that if an existing local storage value exists, then
> > > > > > > +                * it doesn't matter if the charge failed because we can just
> > > > > > > +                * "reuse" the charge from the existing local storage element.
> > > > > > > +                */
> > > > > >
> > > > > > But we did allocate a new element which was unaccounted for, even if
> > > > > > it was temporarily.
> > > > > > [for the short period of time till we freed the old element]
> > > > > >
> > > > > > Martin, is this something we are okay with?
> > > > > It is the behavior today already.  Take a look at the bpf_selem_alloc(...., !sold_data)
> > > > > and the later "if (old_sdata) { /* ... */ bpf_selem_unlink_storage_nolock(..., false); }"
> > > > > Most things happen in a raw_spin_lock, so this should be very brief moment.
> > > > > Not perfect but should be fine.
> > > > >
> > > > > If it always error out on charge failure, it will risk the case that the
> > > > > userspace's syscall will unnecessary be failed on mem charging while it only
> > > > > tries to replace the old_sdata.
> > > > >
> > > > > If the concern is the increased chance of brief moment of unaccounted memory
> > > > > from the helper side now because GFP_KERNEL is from the helper only,
> > > > > another option that came up to my mind is to decide to do the alloc before or
> > > > > after raw_spin_lock_irqsave() based on mem_flags.  The GFP_KERNEL here is only
> > > > > calling from the bpf helper side and it is always done with BPF_NOEXIST
> > > > > because the bpf helper has already done a lookup, so it should
> > > > > always charge success first and then alloc.
> > > > >
> > > > > Something like this that drafted on top of this patch.  Untested code:
> > > >
> > > > I think this looks okay. One minor comment below:
> > > >
> > > > >
> > > > > diff --git c/kernel/bpf/bpf_local_storage.c w/kernel/bpf/bpf_local_storage.c
> > > > > index 092a1ac772d7..b48beb57fe6e 100644
> > > > > --- c/kernel/bpf/bpf_local_storage.c
> > > > > +++ w/kernel/bpf/bpf_local_storage.c
> > > > > @@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
> > > > >
> > > > >  struct bpf_local_storage_elem *
> > > > >  bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > > > > -               void *value, bool charge_mem)
> > > > > +               void *value, bool charge_mem, gfp_t mem_flags)
> > > > >  {
> > > > >         struct bpf_local_storage_elem *selem;
> > > > >
> > > > > @@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > > > >                 return NULL;
> > > > >
> > > > >         selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > > > > -                               GFP_ATOMIC | __GFP_NOWARN);
> > > > > +                               mem_flags | __GFP_NOWARN);
> > > > >         if (selem) {
> > > > >                 if (value)
> > > > >                         memcpy(SDATA(selem)->data, value, smap->map.value_size);
> > > > >
> > > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > > >                 }
> > > > >         }
> > > > >
> > > > > +       if (mem_flags == GFP_KERNEL) {
> > > >
> > > > It seems like what this check really is (and similarly for the other
> > > > mem_flags based check you have below.
> > > >
> > > > "am I called from a sleepable helper"
> > > >
> > > > and I wonder if, instead of the verifier setting mem_flags, could set
> > > > a boolean "sleepable_helper_call"
> > > > which might be more useful and readable and is more relevant to the
> > > > check that the verifier is
> > > > performing "if (env->prog->aux->sleepable)"
> > >
> > > I think you're proposing to pass a boolean flag
> > > into the helper instead of gfp_t?
> > > I think gfp_t is cleaner.
> > > For example we might allow local storage access under bpf_spin_lock
> > > or other critical sections.
> > > Passing boolean flag of the prog state is not equivalent
> > > to the availability of gfp at the callsite inside bpf prog code.
> >
> > Ah yes, then using gfp_t makes sense as we may have other use cases.
> >
> > I think we can follow up with the changes Martin suggested separately as
> > the current behaviour is essentially the same.
> >
> > In any case, you can add my:
> >
> > Acked-by: KP Singh <kpsingh@xxxxxxxxxx>
> 
> Thanks for the discussion, KP, Kumar, Martin, and Alexei!
> 
> For v2, I will make the following changes:
> 1) Allocate the memory before/after the raw_spin_lock_irqsave,
> depending on mem_flags
> 2) Change the comment "*mem_flags* is set by the bpf verifier" to
> "*mem_flags* is set by the bpf verifier. Any value set through uapi
> will be ignored"
It will also be useful to double check if the existing sleepable
lsm selftests can cover this change.