Re: [PATCH bpf-next v1] bpf: Enable non-atomic allocations in local storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 15, 2022 at 8:05 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
>
> On Tue, Mar 15, 2022 at 03:26:46AM +0100, KP Singh wrote:
> [ ... ]
>
> > >  struct bpf_local_storage_data *
> > >  bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > -                        void *value, u64 map_flags)
> > > +                        void *value, u64 map_flags, gfp_t mem_flags)
> > >  {
> > >         struct bpf_local_storage_data *old_sdata = NULL;
> > >         struct bpf_local_storage_elem *selem;
> > >         struct bpf_local_storage *local_storage;
> > >         unsigned long flags;
> > > -       int err;
> > > +       int err, charge_err;
> > >
> > >         /* BPF_EXIST and BPF_NOEXIST cannot be both set */
> > >         if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
> > > @@ -373,11 +373,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > >                 if (err)
> > >                         return ERR_PTR(err);
> > >
> > > -               selem = bpf_selem_alloc(smap, owner, value, true);
> > > +               selem = bpf_selem_alloc(smap, owner, value, mem_flags);
> > >                 if (!selem)
> > >                         return ERR_PTR(-ENOMEM);
> > >
> > > -               err = bpf_local_storage_alloc(owner, smap, selem);
> > > +               err = bpf_local_storage_alloc(owner, smap, selem, mem_flags);
> > >                 if (err) {
> > >                         kfree(selem);
> > >                         mem_uncharge(smap, owner, smap->elem_size);
> > > @@ -404,6 +404,19 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > >                 }
> > >         }
> > >
> > > +       /* Since mem_flags can be non-atomic, we need to do the memory
> > > +        * allocation outside the spinlock.
> > > +        *
> > > +        * There are a few cases where it is permissible for the memory charge
> > > +        * and allocation to fail (eg if BPF_F_LOCK is set and a local storage
> > > +        * value already exists, we can swap values without needing an
> > > +        * allocation), so in the case of a failure here, continue on and see
> > > +        * if the failure is relevant.
> > > +        */
> > > +       charge_err = mem_charge(smap, owner, smap->elem_size);
> > > +       selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > > +                               mem_flags | __GFP_NOWARN);
> > > +
> > >         raw_spin_lock_irqsave(&local_storage->lock, flags);
> > >
> > >         /* Recheck local_storage->list under local_storage->lock */
> > > @@ -425,25 +438,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > >         if (old_sdata && (map_flags & BPF_F_LOCK)) {
> > >                 copy_map_value_locked(&smap->map, old_sdata->data, value,
> > >                                       false);
> > > -               selem = SELEM(old_sdata);
> > > -               goto unlock;
> > > +
> > > +               raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > > +
> > > +               if (!charge_err)
> > > +                       mem_uncharge(smap, owner, smap->elem_size);
> > > +               kfree(selem);
> > > +
> > > +               return old_sdata;
> > > +       }
> > > +
> > > +       if (!old_sdata && charge_err) {
> > > +               /* If there is no existing local storage value, then this means
> > > +                * we needed the charge to succeed. We must make sure this did not
> > > +                * return an error.
> > > +                *
> > > +                * Please note that if an existing local storage value exists, then
> > > +                * it doesn't matter if the charge failed because we can just
> > > +                * "reuse" the charge from the existing local storage element.
> > > +                */
> >
> > But we did allocate a new element which was unaccounted for, even if
> > it was temporarily.
> > [for the short period of time till we freed the old element]
> >
> > Martin, is this something we are okay with?
> It is the behavior today already.  Take a look at the bpf_selem_alloc(...., !sold_data)
> and the later "if (old_sdata) { /* ... */ bpf_selem_unlink_storage_nolock(..., false); }"
> Most things happen in a raw_spin_lock, so this should be very brief moment.
> Not perfect but should be fine.
>
> If it always error out on charge failure, it will risk the case that the
> userspace's syscall will unnecessary be failed on mem charging while it only
> tries to replace the old_sdata.
>
> If the concern is the increased chance of brief moment of unaccounted memory
> from the helper side now because GFP_KERNEL is from the helper only,
> another option that came up to my mind is to decide to do the alloc before or
> after raw_spin_lock_irqsave() based on mem_flags.  The GFP_KERNEL here is only
> calling from the bpf helper side and it is always done with BPF_NOEXIST
> because the bpf helper has already done a lookup, so it should
> always charge success first and then alloc.
>
> Something like this that drafted on top of this patch.  Untested code:

I think this looks okay. One minor comment below:

>
> diff --git c/kernel/bpf/bpf_local_storage.c w/kernel/bpf/bpf_local_storage.c
> index 092a1ac772d7..b48beb57fe6e 100644
> --- c/kernel/bpf/bpf_local_storage.c
> +++ w/kernel/bpf/bpf_local_storage.c
> @@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
>
>  struct bpf_local_storage_elem *
>  bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> -               void *value, bool charge_mem)
> +               void *value, bool charge_mem, gfp_t mem_flags)
>  {
>         struct bpf_local_storage_elem *selem;
>
> @@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
>                 return NULL;
>
>         selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> -                               GFP_ATOMIC | __GFP_NOWARN);
> +                               mem_flags | __GFP_NOWARN);
>         if (selem) {
>                 if (value)
>                         memcpy(SDATA(selem)->data, value, smap->map.value_size);
>
> @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
>                 }
>         }
>
> +       if (mem_flags == GFP_KERNEL) {

It seems like what this check really is (and similarly for the other
mem_flags based check you have below.

"am I called from a sleepable helper"

and I wonder if, instead of the verifier setting mem_flags, could set
a boolean "sleepable_helper_call"
which might be more useful and readable and is more relevant to the
check that the verifier is
performing "if (env->prog->aux->sleepable)"


> +               selem = bpf_selem_alloc(smap, owner, value, true, mem_flags);
> +               if (!selem)
> +                       return ERR_PTR(-ENOMEM);
> +       }
> +
>         raw_spin_lock_irqsave(&local_storage->lock, flags);
>
>         /* Recheck local_storage->list under local_storage->lock */
> @@ -438,10 +448,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
>          * old_sdata will not be uncharged later during
>          * bpf_selem_unlink_storage_nolock().
>          */
> -       selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
> -       if (!selem) {
> -               err = -ENOMEM;
> -               goto unlock_err;
> +       if (mem_flags != GFP_KERNEL) {
> +               selem = bpf_selem_alloc(smap, owner, value, !old_sdata, mem_flags);
> +               if (!selem) {
> +                       err = -ENOMEM;
> +                       goto unlock_err;
> +               }
>         }
>
>         /* First, link the new selem to the map */
> @@ -463,6 +475,10 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
>
>  unlock_err:
>         raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> +       if (selem) {
> +               mem_uncharge(smap, owner, smap->elem_size);
> +               kfree(selem);
> +       }
>         return ERR_PTR(err);
>  }
>
>
> >
> > > +               err = charge_err;
> > > +               goto unlock_err;
> > >         }
> > >
> > > -       /* local_storage->lock is held.  Hence, we are sure
> > > -        * we can unlink and uncharge the old_sdata successfully
> > > -        * later.  Hence, instead of charging the new selem now
> > > -        * and then uncharge the old selem later (which may cause
> > > -        * a potential but unnecessary charge failure),  avoid taking
> > > -        * a charge at all here (the "!old_sdata" check) and the
> > > -        * old_sdata will not be uncharged later during
> > > -        * bpf_selem_unlink_storage_nolock().
> > > -        */
> > > -       selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
> > >         if (!selem) {
> > >                 err = -ENOMEM;
> > >                 goto unlock_err;
> > >         }
> > >
> > > +       if (value)
> > > +               memcpy(SDATA(selem)->data, value, smap->map.value_size);
> > > +
> > >         /* First, link the new selem to the map */
> > >         bpf_selem_link_map(smap, selem);
> > >
> > > @@ -454,15 +479,17 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > >         if (old_sdata) {
> > >                 bpf_selem_unlink_map(SELEM(old_sdata));
> > >                 bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
> > > -                                               false);
> > > +                                               !charge_err);
> > >         }
> > >
> > > -unlock:
> > >         raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > >         return SDATA(selem);
> > >
> > >  unlock_err:
> > >         raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > > +       if (!charge_err)
> > > +               mem_uncharge(smap, owner, smap->elem_size);
> > > +       kfree(selem);
> > >         return ERR_PTR(err);
> > >  }
> > >
> > > diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> > > index 5da7bed0f5f6..bb9e22bad42b 100644
> > > --- a/kernel/bpf/bpf_task_storage.c
> > > +++ b/kernel/bpf/bpf_task_storage.c
> > > @@ -174,7 +174,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
> > >
> > >         bpf_task_storage_lock();
> > >         sdata = bpf_local_storage_update(
> > > -               task, (struct bpf_local_storage_map *)map, value, map_flags);
> > > +               task, (struct bpf_local_storage_map *)map, value, map_flags,
> > > +               GFP_ATOMIC);
> > >         bpf_task_storage_unlock();
> > >
> > >         err = PTR_ERR_OR_ZERO(sdata);
> > > @@ -226,8 +227,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
> > >         return err;
> > >  }
> > >
> > > -BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
> > > -          task, void *, value, u64, flags)
> > > +/* *mem_flags* is set by the bpf verifier */
> >
> > Is there a precedence of this happening for any other helpers?
> Kumar has also mentioned the timer helper.
>
> >
> > You may want to add here that "any value, even if set by uapi, will be ignored"

I guess this comment is still helpful if a user is not using the
bpf_helpers header
generated from the uapi doc strings?

> >
> > Can we go even beyond this and ensure that the verifier understands
> > that this is an
> > "internal only arg" something in check_helper_call?
> The compiler is free to store anything in R5 before calling the helper, so
> verifier cannot enforce it is unused or not, and the verifier does not have to.

Thanks Kumar and Martin.

>
> >
> > > +BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
> > > +          task, void *, value, u64, flags, gfp_t, mem_flags)
> > >  {
> > >         struct bpf_local_storage_data *sdata;
> > >
> > > @@ -250,7 +252,7 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
> > >             (flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
> > >                 sdata = bpf_local_storage_update(
> > >                         task, (struct bpf_local_storage_map *)map, value,
> > > -                       BPF_NOEXIST);
> > > +                       BPF_NOEXIST, mem_flags);
> > >
> > >  unlock:
> > >         bpf_task_storage_unlock();



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux