On Tue, Mar 15, 2022 at 12:51 PM KP Singh <kpsingh@xxxxxxxxxx> wrote: > > On Tue, Mar 15, 2022 at 8:05 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > On Tue, Mar 15, 2022 at 03:26:46AM +0100, KP Singh wrote: > > [ ... ] > > > > > > struct bpf_local_storage_data * > > > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > - void *value, u64 map_flags) > > > > + void *value, u64 map_flags, gfp_t mem_flags) > > > > { > > > > struct bpf_local_storage_data *old_sdata = NULL; > > > > struct bpf_local_storage_elem *selem; > > > > struct bpf_local_storage *local_storage; > > > > unsigned long flags; > > > > - int err; > > > > + int err, charge_err; > > > > > > > > /* BPF_EXIST and BPF_NOEXIST cannot be both set */ > > > > if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || > > > > @@ -373,11 +373,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > if (err) > > > > return ERR_PTR(err); > > > > > > > > - selem = bpf_selem_alloc(smap, owner, value, true); > > > > + selem = bpf_selem_alloc(smap, owner, value, mem_flags); > > > > if (!selem) > > > > return ERR_PTR(-ENOMEM); > > > > > > > > - err = bpf_local_storage_alloc(owner, smap, selem); > > > > + err = bpf_local_storage_alloc(owner, smap, selem, mem_flags); > > > > if (err) { > > > > kfree(selem); > > > > mem_uncharge(smap, owner, smap->elem_size); > > > > @@ -404,6 +404,19 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > } > > > > } > > > > > > > > + /* Since mem_flags can be non-atomic, we need to do the memory > > > > + * allocation outside the spinlock. > > > > + * > > > > + * There are a few cases where it is permissible for the memory charge > > > > + * and allocation to fail (eg if BPF_F_LOCK is set and a local storage > > > > + * value already exists, we can swap values without needing an > > > > + * allocation), so in the case of a failure here, continue on and see > > > > + * if the failure is relevant. > > > > + */ > > > > + charge_err = mem_charge(smap, owner, smap->elem_size); > > > > + selem = bpf_map_kzalloc(&smap->map, smap->elem_size, > > > > + mem_flags | __GFP_NOWARN); > > > > + > > > > raw_spin_lock_irqsave(&local_storage->lock, flags); > > > > > > > > /* Recheck local_storage->list under local_storage->lock */ > > > > @@ -425,25 +438,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > if (old_sdata && (map_flags & BPF_F_LOCK)) { > > > > copy_map_value_locked(&smap->map, old_sdata->data, value, > > > > false); > > > > - selem = SELEM(old_sdata); > > > > - goto unlock; > > > > + > > > > + raw_spin_unlock_irqrestore(&local_storage->lock, flags); > > > > + > > > > + if (!charge_err) > > > > + mem_uncharge(smap, owner, smap->elem_size); > > > > + kfree(selem); > > > > + > > > > + return old_sdata; > > > > + } > > > > + > > > > + if (!old_sdata && charge_err) { > > > > + /* If there is no existing local storage value, then this means > > > > + * we needed the charge to succeed. We must make sure this did not > > > > + * return an error. > > > > + * > > > > + * Please note that if an existing local storage value exists, then > > > > + * it doesn't matter if the charge failed because we can just > > > > + * "reuse" the charge from the existing local storage element. > > > > + */ > > > > > > But we did allocate a new element which was unaccounted for, even if > > > it was temporarily. > > > [for the short period of time till we freed the old element] > > > > > > Martin, is this something we are okay with? > > It is the behavior today already. Take a look at the bpf_selem_alloc(...., !sold_data) > > and the later "if (old_sdata) { /* ... */ bpf_selem_unlink_storage_nolock(..., false); }" > > Most things happen in a raw_spin_lock, so this should be very brief moment. > > Not perfect but should be fine. > > > > If it always error out on charge failure, it will risk the case that the > > userspace's syscall will unnecessary be failed on mem charging while it only > > tries to replace the old_sdata. > > > > If the concern is the increased chance of brief moment of unaccounted memory > > from the helper side now because GFP_KERNEL is from the helper only, > > another option that came up to my mind is to decide to do the alloc before or > > after raw_spin_lock_irqsave() based on mem_flags. The GFP_KERNEL here is only > > calling from the bpf helper side and it is always done with BPF_NOEXIST > > because the bpf helper has already done a lookup, so it should > > always charge success first and then alloc. > > > > Something like this that drafted on top of this patch. Untested code: > > I think this looks okay. One minor comment below: > > > > > diff --git c/kernel/bpf/bpf_local_storage.c w/kernel/bpf/bpf_local_storage.c > > index 092a1ac772d7..b48beb57fe6e 100644 > > --- c/kernel/bpf/bpf_local_storage.c > > +++ w/kernel/bpf/bpf_local_storage.c > > @@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) > > > > struct bpf_local_storage_elem * > > bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, > > - void *value, bool charge_mem) > > + void *value, bool charge_mem, gfp_t mem_flags) > > { > > struct bpf_local_storage_elem *selem; > > > > @@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, > > return NULL; > > > > selem = bpf_map_kzalloc(&smap->map, smap->elem_size, > > - GFP_ATOMIC | __GFP_NOWARN); > > + mem_flags | __GFP_NOWARN); > > if (selem) { > > if (value) > > memcpy(SDATA(selem)->data, value, smap->map.value_size); > > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > } > > } > > > > + if (mem_flags == GFP_KERNEL) { > > It seems like what this check really is (and similarly for the other > mem_flags based check you have below. > > "am I called from a sleepable helper" > > and I wonder if, instead of the verifier setting mem_flags, could set > a boolean "sleepable_helper_call" > which might be more useful and readable and is more relevant to the > check that the verifier is > performing "if (env->prog->aux->sleepable)" I think you're proposing to pass a boolean flag into the helper instead of gfp_t? I think gfp_t is cleaner. For example we might allow local storage access under bpf_spin_lock or other critical sections. Passing boolean flag of the prog state is not equivalent to the availability of gfp at the callsite inside bpf prog code.