On Tue, Mar 15, 2022 at 8:05 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > On Tue, Mar 15, 2022 at 03:26:46AM +0100, KP Singh wrote: > [ ... ] > > > > struct bpf_local_storage_data * > > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > - void *value, u64 map_flags) > > > + void *value, u64 map_flags, gfp_t mem_flags) > > > { > > > struct bpf_local_storage_data *old_sdata = NULL; > > > struct bpf_local_storage_elem *selem; > > > struct bpf_local_storage *local_storage; > > > unsigned long flags; > > > - int err; > > > + int err, charge_err; > > > > > > /* BPF_EXIST and BPF_NOEXIST cannot be both set */ > > > if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || > > > @@ -373,11 +373,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > if (err) > > > return ERR_PTR(err); > > > > > > - selem = bpf_selem_alloc(smap, owner, value, true); > > > + selem = bpf_selem_alloc(smap, owner, value, mem_flags); > > > if (!selem) > > > return ERR_PTR(-ENOMEM); > > > > > > - err = bpf_local_storage_alloc(owner, smap, selem); > > > + err = bpf_local_storage_alloc(owner, smap, selem, mem_flags); > > > if (err) { > > > kfree(selem); > > > mem_uncharge(smap, owner, smap->elem_size); > > > @@ -404,6 +404,19 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > } > > > } > > > > > > + /* Since mem_flags can be non-atomic, we need to do the memory > > > + * allocation outside the spinlock. > > > + * > > > + * There are a few cases where it is permissible for the memory charge > > > + * and allocation to fail (eg if BPF_F_LOCK is set and a local storage > > > + * value already exists, we can swap values without needing an > > > + * allocation), so in the case of a failure here, continue on and see > > > + * if the failure is relevant. > > > + */ > > > + charge_err = mem_charge(smap, owner, smap->elem_size); > > > + selem = bpf_map_kzalloc(&smap->map, smap->elem_size, > > > + mem_flags | __GFP_NOWARN); > > > + > > > raw_spin_lock_irqsave(&local_storage->lock, flags); > > > > > > /* Recheck local_storage->list under local_storage->lock */ > > > @@ -425,25 +438,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > if (old_sdata && (map_flags & BPF_F_LOCK)) { > > > copy_map_value_locked(&smap->map, old_sdata->data, value, > > > false); > > > - selem = SELEM(old_sdata); > > > - goto unlock; > > > + > > > + raw_spin_unlock_irqrestore(&local_storage->lock, flags); > > > + > > > + if (!charge_err) > > > + mem_uncharge(smap, owner, smap->elem_size); > > > + kfree(selem); > > > + > > > + return old_sdata; > > > + } > > > + > > > + if (!old_sdata && charge_err) { > > > + /* If there is no existing local storage value, then this means > > > + * we needed the charge to succeed. We must make sure this did not > > > + * return an error. > > > + * > > > + * Please note that if an existing local storage value exists, then > > > + * it doesn't matter if the charge failed because we can just > > > + * "reuse" the charge from the existing local storage element. > > > + */ > > > > But we did allocate a new element which was unaccounted for, even if > > it was temporarily. > > [for the short period of time till we freed the old element] > > > > Martin, is this something we are okay with? > It is the behavior today already. Take a look at the bpf_selem_alloc(...., !sold_data) > and the later "if (old_sdata) { /* ... */ bpf_selem_unlink_storage_nolock(..., false); }" > Most things happen in a raw_spin_lock, so this should be very brief moment. > Not perfect but should be fine. > > If it always error out on charge failure, it will risk the case that the > userspace's syscall will unnecessary be failed on mem charging while it only > tries to replace the old_sdata. > > If the concern is the increased chance of brief moment of unaccounted memory > from the helper side now because GFP_KERNEL is from the helper only, > another option that came up to my mind is to decide to do the alloc before or > after raw_spin_lock_irqsave() based on mem_flags. The GFP_KERNEL here is only > calling from the bpf helper side and it is always done with BPF_NOEXIST > because the bpf helper has already done a lookup, so it should > always charge success first and then alloc. > > Something like this that drafted on top of this patch. Untested code: I think this looks okay. One minor comment below: > > diff --git c/kernel/bpf/bpf_local_storage.c w/kernel/bpf/bpf_local_storage.c > index 092a1ac772d7..b48beb57fe6e 100644 > --- c/kernel/bpf/bpf_local_storage.c > +++ w/kernel/bpf/bpf_local_storage.c > @@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) > > struct bpf_local_storage_elem * > bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, > - void *value, bool charge_mem) > + void *value, bool charge_mem, gfp_t mem_flags) > { > struct bpf_local_storage_elem *selem; > > @@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, > return NULL; > > selem = bpf_map_kzalloc(&smap->map, smap->elem_size, > - GFP_ATOMIC | __GFP_NOWARN); > + mem_flags | __GFP_NOWARN); > if (selem) { > if (value) > memcpy(SDATA(selem)->data, value, smap->map.value_size); > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > } > } > > + if (mem_flags == GFP_KERNEL) { It seems like what this check really is (and similarly for the other mem_flags based check you have below. "am I called from a sleepable helper" and I wonder if, instead of the verifier setting mem_flags, could set a boolean "sleepable_helper_call" which might be more useful and readable and is more relevant to the check that the verifier is performing "if (env->prog->aux->sleepable)" > + selem = bpf_selem_alloc(smap, owner, value, true, mem_flags); > + if (!selem) > + return ERR_PTR(-ENOMEM); > + } > + > raw_spin_lock_irqsave(&local_storage->lock, flags); > > /* Recheck local_storage->list under local_storage->lock */ > @@ -438,10 +448,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > * old_sdata will not be uncharged later during > * bpf_selem_unlink_storage_nolock(). > */ > - selem = bpf_selem_alloc(smap, owner, value, !old_sdata); > - if (!selem) { > - err = -ENOMEM; > - goto unlock_err; > + if (mem_flags != GFP_KERNEL) { > + selem = bpf_selem_alloc(smap, owner, value, !old_sdata, mem_flags); > + if (!selem) { > + err = -ENOMEM; > + goto unlock_err; > + } > } > > /* First, link the new selem to the map */ > @@ -463,6 +475,10 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > unlock_err: > raw_spin_unlock_irqrestore(&local_storage->lock, flags); > + if (selem) { > + mem_uncharge(smap, owner, smap->elem_size); > + kfree(selem); > + } > return ERR_PTR(err); > } > > > > > > > + err = charge_err; > > > + goto unlock_err; > > > } > > > > > > - /* local_storage->lock is held. Hence, we are sure > > > - * we can unlink and uncharge the old_sdata successfully > > > - * later. Hence, instead of charging the new selem now > > > - * and then uncharge the old selem later (which may cause > > > - * a potential but unnecessary charge failure), avoid taking > > > - * a charge at all here (the "!old_sdata" check) and the > > > - * old_sdata will not be uncharged later during > > > - * bpf_selem_unlink_storage_nolock(). > > > - */ > > > - selem = bpf_selem_alloc(smap, owner, value, !old_sdata); > > > if (!selem) { > > > err = -ENOMEM; > > > goto unlock_err; > > > } > > > > > > + if (value) > > > + memcpy(SDATA(selem)->data, value, smap->map.value_size); > > > + > > > /* First, link the new selem to the map */ > > > bpf_selem_link_map(smap, selem); > > > > > > @@ -454,15 +479,17 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > if (old_sdata) { > > > bpf_selem_unlink_map(SELEM(old_sdata)); > > > bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), > > > - false); > > > + !charge_err); > > > } > > > > > > -unlock: > > > raw_spin_unlock_irqrestore(&local_storage->lock, flags); > > > return SDATA(selem); > > > > > > unlock_err: > > > raw_spin_unlock_irqrestore(&local_storage->lock, flags); > > > + if (!charge_err) > > > + mem_uncharge(smap, owner, smap->elem_size); > > > + kfree(selem); > > > return ERR_PTR(err); > > > } > > > > > > diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c > > > index 5da7bed0f5f6..bb9e22bad42b 100644 > > > --- a/kernel/bpf/bpf_task_storage.c > > > +++ b/kernel/bpf/bpf_task_storage.c > > > @@ -174,7 +174,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key, > > > > > > bpf_task_storage_lock(); > > > sdata = bpf_local_storage_update( > > > - task, (struct bpf_local_storage_map *)map, value, map_flags); > > > + task, (struct bpf_local_storage_map *)map, value, map_flags, > > > + GFP_ATOMIC); > > > bpf_task_storage_unlock(); > > > > > > err = PTR_ERR_OR_ZERO(sdata); > > > @@ -226,8 +227,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key) > > > return err; > > > } > > > > > > -BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, > > > - task, void *, value, u64, flags) > > > +/* *mem_flags* is set by the bpf verifier */ > > > > Is there a precedence of this happening for any other helpers? > Kumar has also mentioned the timer helper. > > > > > You may want to add here that "any value, even if set by uapi, will be ignored" I guess this comment is still helpful if a user is not using the bpf_helpers header generated from the uapi doc strings? > > > > Can we go even beyond this and ensure that the verifier understands > > that this is an > > "internal only arg" something in check_helper_call? > The compiler is free to store anything in R5 before calling the helper, so > verifier cannot enforce it is unused or not, and the verifier does not have to. Thanks Kumar and Martin. > > > > > > +BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, > > > + task, void *, value, u64, flags, gfp_t, mem_flags) > > > { > > > struct bpf_local_storage_data *sdata; > > > > > > @@ -250,7 +252,7 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, > > > (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) > > > sdata = bpf_local_storage_update( > > > task, (struct bpf_local_storage_map *)map, value, > > > - BPF_NOEXIST); > > > + BPF_NOEXIST, mem_flags); > > > > > > unlock: > > > bpf_task_storage_unlock();