On Tue, Mar 15, 2022 at 01:33:05PM -0700, Joanne Koong wrote: > > > > > > > @@ -425,25 +438,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > > > > if (old_sdata && (map_flags & BPF_F_LOCK)) { > > > > > > > copy_map_value_locked(&smap->map, old_sdata->data, value, > > > > > > > false); > > > > > > > - selem = SELEM(old_sdata); > > > > > > > - goto unlock; > > > > > > > + > > > > > > > + raw_spin_unlock_irqrestore(&local_storage->lock, flags); > > > > > > > + > > > > > > > + if (!charge_err) > > > > > > > + mem_uncharge(smap, owner, smap->elem_size); > > > > > > > + kfree(selem); > > > > > > > + > > > > > > > + return old_sdata; > > > > > > > + } > > > > > > > + > > > > > > > + if (!old_sdata && charge_err) { > > > > > > > + /* If there is no existing local storage value, then this means > > > > > > > + * we needed the charge to succeed. We must make sure this did not > > > > > > > + * return an error. > > > > > > > + * > > > > > > > + * Please note that if an existing local storage value exists, then > > > > > > > + * it doesn't matter if the charge failed because we can just > > > > > > > + * "reuse" the charge from the existing local storage element. > > > > > > > + */ > > > > > > > > > > > > But we did allocate a new element which was unaccounted for, even if > > > > > > it was temporarily. > > > > > > [for the short period of time till we freed the old element] > > > > > > > > > > > > Martin, is this something we are okay with? > > > > > It is the behavior today already. Take a look at the bpf_selem_alloc(...., !sold_data) > > > > > and the later "if (old_sdata) { /* ... */ bpf_selem_unlink_storage_nolock(..., false); }" > > > > > Most things happen in a raw_spin_lock, so this should be very brief moment. > > > > > Not perfect but should be fine. > > > > > > > > > > If it always error out on charge failure, it will risk the case that the > > > > > userspace's syscall will unnecessary be failed on mem charging while it only > > > > > tries to replace the old_sdata. > > > > > > > > > > If the concern is the increased chance of brief moment of unaccounted memory > > > > > from the helper side now because GFP_KERNEL is from the helper only, > > > > > another option that came up to my mind is to decide to do the alloc before or > > > > > after raw_spin_lock_irqsave() based on mem_flags. The GFP_KERNEL here is only > > > > > calling from the bpf helper side and it is always done with BPF_NOEXIST > > > > > because the bpf helper has already done a lookup, so it should > > > > > always charge success first and then alloc. > > > > > > > > > > Something like this that drafted on top of this patch. Untested code: > > > > > > > > I think this looks okay. One minor comment below: > > > > > > > > > > > > > > diff --git c/kernel/bpf/bpf_local_storage.c w/kernel/bpf/bpf_local_storage.c > > > > > index 092a1ac772d7..b48beb57fe6e 100644 > > > > > --- c/kernel/bpf/bpf_local_storage.c > > > > > +++ w/kernel/bpf/bpf_local_storage.c > > > > > @@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) > > > > > > > > > > struct bpf_local_storage_elem * > > > > > bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, > > > > > - void *value, bool charge_mem) > > > > > + void *value, bool charge_mem, gfp_t mem_flags) > > > > > { > > > > > struct bpf_local_storage_elem *selem; > > > > > > > > > > @@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, > > > > > return NULL; > > > > > > > > > > selem = bpf_map_kzalloc(&smap->map, smap->elem_size, > > > > > - GFP_ATOMIC | __GFP_NOWARN); > > > > > + mem_flags | __GFP_NOWARN); > > > > > if (selem) { > > > > > if (value) > > > > > memcpy(SDATA(selem)->data, value, smap->map.value_size); > > > > > > > > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > > } > > > > > } > > > > > > > > > > + if (mem_flags == GFP_KERNEL) { > > > > > > > > It seems like what this check really is (and similarly for the other > > > > mem_flags based check you have below. > > > > > > > > "am I called from a sleepable helper" > > > > > > > > and I wonder if, instead of the verifier setting mem_flags, could set > > > > a boolean "sleepable_helper_call" > > > > which might be more useful and readable and is more relevant to the > > > > check that the verifier is > > > > performing "if (env->prog->aux->sleepable)" > > > > > > I think you're proposing to pass a boolean flag > > > into the helper instead of gfp_t? > > > I think gfp_t is cleaner. > > > For example we might allow local storage access under bpf_spin_lock > > > or other critical sections. > > > Passing boolean flag of the prog state is not equivalent > > > to the availability of gfp at the callsite inside bpf prog code. > > > > Ah yes, then using gfp_t makes sense as we may have other use cases. > > > > I think we can follow up with the changes Martin suggested separately as > > the current behaviour is essentially the same. > > > > In any case, you can add my: > > > > Acked-by: KP Singh <kpsingh@xxxxxxxxxx> > > Thanks for the discussion, KP, Kumar, Martin, and Alexei! > > For v2, I will make the following changes: > 1) Allocate the memory before/after the raw_spin_lock_irqsave, > depending on mem_flags > 2) Change the comment "*mem_flags* is set by the bpf verifier" to > "*mem_flags* is set by the bpf verifier. Any value set through uapi > will be ignored" It will also be useful to double check if the existing sleepable lsm selftests can cover this change.