On Wed, Mar 16, 2022 at 10:26:57PM -0700, Joanne Koong wrote: > On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote: > > > From: Joanne Koong <joannelkoong@xxxxxxxxx> > > > > > > Currently, local storage memory can only be allocated atomically > > > (GFP_ATOMIC). This restriction is too strict for sleepable bpf > > > programs. > > > > > > In this patch, the verifier detects whether the program is sleepable, > > > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a > > > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate > > > down to the local storage functions that allocate memory. > > > > > > Please note that bpf_task/sk/inode_storage_update_elem functions are > > > invoked by userspace applications through syscalls. Preemption is > > > disabled before bpf_task/sk/inode_storage_update_elem is called, which > > > means they will always have to allocate memory atomically. > > > > > > The existing local storage selftests cover both the GFP_ATOMIC and the > > > GFP_KERNEL cases in bpf_local_storage_update. > > > > > > v2 <- v1: > > > * Allocate the memory before/after the raw_spin_lock_irqsave, depending > > > on the gfp flags > > > * Rename mem_flags to gfp_flags > > > * Reword the comment "*mem_flags* is set by the bpf verifier" to > > > "*gfp_flags* is a hidden argument provided by the verifier" > > > * Add a sentence to the commit message about existing local storage > > > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in > > > bpf_local_storage_update. > > > > [ ... ] > > > > > struct bpf_local_storage_data * > > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > - void *value, u64 map_flags) > > > + void *value, u64 map_flags, gfp_t gfp_flags) > > > { > > > struct bpf_local_storage_data *old_sdata = NULL; > > > - struct bpf_local_storage_elem *selem; > > > + struct bpf_local_storage_elem *selem = NULL; > > > struct bpf_local_storage *local_storage; > > > unsigned long flags; > > > int err; > > > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > !map_value_has_spin_lock(&smap->map))) > > > return ERR_PTR(-EINVAL); > > > > > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) > > > + return ERR_PTR(-EINVAL); > > > + > > > local_storage = rcu_dereference_check(*owner_storage(smap, owner), > > > bpf_rcu_lock_held()); > > > if (!local_storage || hlist_empty(&local_storage->list)) { > > > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > if (err) > > > return ERR_PTR(err); > > > > > > - selem = bpf_selem_alloc(smap, owner, value, true); > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > > if (!selem) > > > return ERR_PTR(-ENOMEM); > > > > > > - err = bpf_local_storage_alloc(owner, smap, selem); > > > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); > > > if (err) { > > > kfree(selem); > > > mem_uncharge(smap, owner, smap->elem_size); > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > } > > > } > > > > > > + if (gfp_flags == GFP_KERNEL) { > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > I think this new path is not executed by the existing > > "progs/local_storage.c" test which has sleepable lsm prog. At least a second > > BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed? > > or there is other selftest covering this new path that I missed? > Thanks for your feedback. I think I'm misunderstanding how the > progs/local_storage.c test and/or local storage works then. Would you > mind explaining why a second map is needed? > > This is my (faulty) understanding of what is happening: > 1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call > to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag > set, which will call into bpf_local_storage_update (which will create > the local storage + the selem, and put it in the RCU for that > inode_storage_map)