Re: [PATCH bpf-next v1] bpf: Enable non-atomic allocations in local storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 15, 2022 at 03:26:46AM +0100, KP Singh wrote:
[ ... ]

> >  struct bpf_local_storage_data *
> >  bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > -                        void *value, u64 map_flags)
> > +                        void *value, u64 map_flags, gfp_t mem_flags)
> >  {
> >         struct bpf_local_storage_data *old_sdata = NULL;
> >         struct bpf_local_storage_elem *selem;
> >         struct bpf_local_storage *local_storage;
> >         unsigned long flags;
> > -       int err;
> > +       int err, charge_err;
> >
> >         /* BPF_EXIST and BPF_NOEXIST cannot be both set */
> >         if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
> > @@ -373,11 +373,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> >                 if (err)
> >                         return ERR_PTR(err);
> >
> > -               selem = bpf_selem_alloc(smap, owner, value, true);
> > +               selem = bpf_selem_alloc(smap, owner, value, mem_flags);
> >                 if (!selem)
> >                         return ERR_PTR(-ENOMEM);
> >
> > -               err = bpf_local_storage_alloc(owner, smap, selem);
> > +               err = bpf_local_storage_alloc(owner, smap, selem, mem_flags);
> >                 if (err) {
> >                         kfree(selem);
> >                         mem_uncharge(smap, owner, smap->elem_size);
> > @@ -404,6 +404,19 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> >                 }
> >         }
> >
> > +       /* Since mem_flags can be non-atomic, we need to do the memory
> > +        * allocation outside the spinlock.
> > +        *
> > +        * There are a few cases where it is permissible for the memory charge
> > +        * and allocation to fail (eg if BPF_F_LOCK is set and a local storage
> > +        * value already exists, we can swap values without needing an
> > +        * allocation), so in the case of a failure here, continue on and see
> > +        * if the failure is relevant.
> > +        */
> > +       charge_err = mem_charge(smap, owner, smap->elem_size);
> > +       selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > +                               mem_flags | __GFP_NOWARN);
> > +
> >         raw_spin_lock_irqsave(&local_storage->lock, flags);
> >
> >         /* Recheck local_storage->list under local_storage->lock */
> > @@ -425,25 +438,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> >         if (old_sdata && (map_flags & BPF_F_LOCK)) {
> >                 copy_map_value_locked(&smap->map, old_sdata->data, value,
> >                                       false);
> > -               selem = SELEM(old_sdata);
> > -               goto unlock;
> > +
> > +               raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > +
> > +               if (!charge_err)
> > +                       mem_uncharge(smap, owner, smap->elem_size);
> > +               kfree(selem);
> > +
> > +               return old_sdata;
> > +       }
> > +
> > +       if (!old_sdata && charge_err) {
> > +               /* If there is no existing local storage value, then this means
> > +                * we needed the charge to succeed. We must make sure this did not
> > +                * return an error.
> > +                *
> > +                * Please note that if an existing local storage value exists, then
> > +                * it doesn't matter if the charge failed because we can just
> > +                * "reuse" the charge from the existing local storage element.
> > +                */
> 
> But we did allocate a new element which was unaccounted for, even if
> it was temporarily.
> [for the short period of time till we freed the old element]
> 
> Martin, is this something we are okay with?
It is the behavior today already.  Take a look at the bpf_selem_alloc(...., !sold_data)
and the later "if (old_sdata) { /* ... */ bpf_selem_unlink_storage_nolock(..., false); }"
Most things happen in a raw_spin_lock, so this should be very brief moment.
Not perfect but should be fine.

If it always error out on charge failure, it will risk the case that the
userspace's syscall will unnecessary be failed on mem charging while it only
tries to replace the old_sdata.

If the concern is the increased chance of brief moment of unaccounted memory
from the helper side now because GFP_KERNEL is from the helper only,
another option that came up to my mind is to decide to do the alloc before or
after raw_spin_lock_irqsave() based on mem_flags.  The GFP_KERNEL here is only
calling from the bpf helper side and it is always done with BPF_NOEXIST
because the bpf helper has already done a lookup, so it should
always charge success first and then alloc.

Something like this that drafted on top of this patch.  Untested code:

diff --git c/kernel/bpf/bpf_local_storage.c w/kernel/bpf/bpf_local_storage.c
index 092a1ac772d7..b48beb57fe6e 100644
--- c/kernel/bpf/bpf_local_storage.c
+++ w/kernel/bpf/bpf_local_storage.c
@@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
 
 struct bpf_local_storage_elem *
 bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
-		void *value, bool charge_mem)
+		void *value, bool charge_mem, gfp_t mem_flags)
 {
 	struct bpf_local_storage_elem *selem;
 
@@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
 		return NULL;
 
 	selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
-				GFP_ATOMIC | __GFP_NOWARN);
+				mem_flags | __GFP_NOWARN);
 	if (selem) {
 		if (value)
 			memcpy(SDATA(selem)->data, value, smap->map.value_size);

@@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
 		}
 	}
 
+	if (mem_flags == GFP_KERNEL) {
+		selem = bpf_selem_alloc(smap, owner, value, true, mem_flags);
+		if (!selem)
+			return ERR_PTR(-ENOMEM);
+	}
+
 	raw_spin_lock_irqsave(&local_storage->lock, flags);
 
 	/* Recheck local_storage->list under local_storage->lock */
@@ -438,10 +448,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
 	 * old_sdata will not be uncharged later during
 	 * bpf_selem_unlink_storage_nolock().
 	 */
-	selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
-	if (!selem) {
-		err = -ENOMEM;
-		goto unlock_err;
+	if (mem_flags != GFP_KERNEL) {
+		selem = bpf_selem_alloc(smap, owner, value, !old_sdata, mem_flags);
+		if (!selem) {
+			err = -ENOMEM;
+			goto unlock_err;
+		}
 	}
 
 	/* First, link the new selem to the map */
@@ -463,6 +475,10 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
 
 unlock_err:
 	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+	if (selem) {
+		mem_uncharge(smap, owner, smap->elem_size);
+		kfree(selem);
+	}
 	return ERR_PTR(err);
 }
 

> 
> > +               err = charge_err;
> > +               goto unlock_err;
> >         }
> >
> > -       /* local_storage->lock is held.  Hence, we are sure
> > -        * we can unlink and uncharge the old_sdata successfully
> > -        * later.  Hence, instead of charging the new selem now
> > -        * and then uncharge the old selem later (which may cause
> > -        * a potential but unnecessary charge failure),  avoid taking
> > -        * a charge at all here (the "!old_sdata" check) and the
> > -        * old_sdata will not be uncharged later during
> > -        * bpf_selem_unlink_storage_nolock().
> > -        */
> > -       selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
> >         if (!selem) {
> >                 err = -ENOMEM;
> >                 goto unlock_err;
> >         }
> >
> > +       if (value)
> > +               memcpy(SDATA(selem)->data, value, smap->map.value_size);
> > +
> >         /* First, link the new selem to the map */
> >         bpf_selem_link_map(smap, selem);
> >
> > @@ -454,15 +479,17 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> >         if (old_sdata) {
> >                 bpf_selem_unlink_map(SELEM(old_sdata));
> >                 bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
> > -                                               false);
> > +                                               !charge_err);
> >         }
> >
> > -unlock:
> >         raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> >         return SDATA(selem);
> >
> >  unlock_err:
> >         raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > +       if (!charge_err)
> > +               mem_uncharge(smap, owner, smap->elem_size);
> > +       kfree(selem);
> >         return ERR_PTR(err);
> >  }
> >
> > diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> > index 5da7bed0f5f6..bb9e22bad42b 100644
> > --- a/kernel/bpf/bpf_task_storage.c
> > +++ b/kernel/bpf/bpf_task_storage.c
> > @@ -174,7 +174,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
> >
> >         bpf_task_storage_lock();
> >         sdata = bpf_local_storage_update(
> > -               task, (struct bpf_local_storage_map *)map, value, map_flags);
> > +               task, (struct bpf_local_storage_map *)map, value, map_flags,
> > +               GFP_ATOMIC);
> >         bpf_task_storage_unlock();
> >
> >         err = PTR_ERR_OR_ZERO(sdata);
> > @@ -226,8 +227,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
> >         return err;
> >  }
> >
> > -BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
> > -          task, void *, value, u64, flags)
> > +/* *mem_flags* is set by the bpf verifier */
> 
> Is there a precedence of this happening for any other helpers?
Kumar has also mentioned the timer helper.

> 
> You may want to add here that "any value, even if set by uapi, will be ignored"
> 
> Can we go even beyond this and ensure that the verifier understands
> that this is an
> "internal only arg" something in check_helper_call?
The compiler is free to store anything in R5 before calling the helper, so 
verifier cannot enforce it is unused or not, and the verifier does not have to.

> 
> > +BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
> > +          task, void *, value, u64, flags, gfp_t, mem_flags)
> >  {
> >         struct bpf_local_storage_data *sdata;
> >
> > @@ -250,7 +252,7 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
> >             (flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
> >                 sdata = bpf_local_storage_update(
> >                         task, (struct bpf_local_storage_map *)map, value,
> > -                       BPF_NOEXIST);
> > +                       BPF_NOEXIST, mem_flags);
> >
> >  unlock:
> >         bpf_task_storage_unlock();



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux