On Thu, Mar 17, 2022 at 11:04 AM Martin KaFai Lau <kafai@xxxxxx> wrote: > > On Wed, Mar 16, 2022 at 10:26:57PM -0700, Joanne Koong wrote: > > On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > > > On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote: > > > > From: Joanne Koong <joannelkoong@xxxxxxxxx> > > > > > > > > Currently, local storage memory can only be allocated atomically > > > > (GFP_ATOMIC). This restriction is too strict for sleepable bpf > > > > programs. > > > > > > > > In this patch, the verifier detects whether the program is sleepable, > > > > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a > > > > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate > > > > down to the local storage functions that allocate memory. > > > > > > > > Please note that bpf_task/sk/inode_storage_update_elem functions are > > > > invoked by userspace applications through syscalls. Preemption is > > > > disabled before bpf_task/sk/inode_storage_update_elem is called, which > > > > means they will always have to allocate memory atomically. > > > > > > > > The existing local storage selftests cover both the GFP_ATOMIC and the > > > > GFP_KERNEL cases in bpf_local_storage_update. > > > > > > > > v2 <- v1: > > > > * Allocate the memory before/after the raw_spin_lock_irqsave, depending > > > > on the gfp flags > > > > * Rename mem_flags to gfp_flags > > > > * Reword the comment "*mem_flags* is set by the bpf verifier" to > > > > "*gfp_flags* is a hidden argument provided by the verifier" > > > > * Add a sentence to the commit message about existing local storage > > > > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in > > > > bpf_local_storage_update. > > > > > > [ ... ] > > > > > > > struct bpf_local_storage_data * > > > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > - void *value, u64 map_flags) > > > > + void *value, u64 map_flags, gfp_t gfp_flags) > > > > { > > > > struct bpf_local_storage_data *old_sdata = NULL; > > > > - struct bpf_local_storage_elem *selem; > > > > + struct bpf_local_storage_elem *selem = NULL; > > > > struct bpf_local_storage *local_storage; > > > > unsigned long flags; > > > > int err; > > > > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > !map_value_has_spin_lock(&smap->map))) > > > > return ERR_PTR(-EINVAL); > > > > > > > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) > > > > + return ERR_PTR(-EINVAL); > > > > + > > > > local_storage = rcu_dereference_check(*owner_storage(smap, owner), > > > > bpf_rcu_lock_held()); > > > > if (!local_storage || hlist_empty(&local_storage->list)) { > > > > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > if (err) > > > > return ERR_PTR(err); > > > > > > > > - selem = bpf_selem_alloc(smap, owner, value, true); > > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > > > if (!selem) > > > > return ERR_PTR(-ENOMEM); > > > > > > > > - err = bpf_local_storage_alloc(owner, smap, selem); > > > > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); > > > > if (err) { > > > > kfree(selem); > > > > mem_uncharge(smap, owner, smap->elem_size); > > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, > > > > } > > > > } > > > > > > > > + if (gfp_flags == GFP_KERNEL) { > > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); > > > I think this new path is not executed by the existing > > > "progs/local_storage.c" test which has sleepable lsm prog. At least a second > > > BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed? > > > or there is other selftest covering this new path that I missed? > > Thanks for your feedback. I think I'm misunderstanding how the > > progs/local_storage.c test and/or local storage works then. Would you > > mind explaining why a second map is needed? > > > > This is my (faulty) understanding of what is happening: > > 1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call > > to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag > > set, which will call into bpf_local_storage_update (which will create > > the local storage + the selem, and put it in the RCU for that > > inode_storage_map) > From reading the comment above the bpf_inode_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE): > "new_dentry->d_inode can be NULL", so it is expected to fail. > Meaning no storage is created. > > > > > 2) Then, further down in the "local_storage.c" file in > > "SEC("lsm.s/bprm_committed_creds")", there is another call to > > bpf_inode_storage_get on the same inode_storage_map but on a different > > inode, also with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set. This > > will also call into bpf_local_storage_update. > I belive this is the inode and the storage that the second > bpf_inode_storage_get(..., 0) in the "inode_rename" bpf-prog is supposed > to get. Otherwise, I don't see how the test can pass. > > > > > 3) In bpf_local_storage_update from the call in #2, it sees that there > > is a local storage associated with this map in the RCU, it tries to > > look for the inode but doesn't find it, so it needs to allocate with > > GFP_KERNEL a new selem and then update with the new selem. > Correct, that will be the very first storage created for this inode > and it will go through the "if (!local_storage || hlist_empty(&local_storage->list))" > allocation code path in bpf_local_storage_update() which is > an existing code path. > Ah, I see. I mistakenly thought inodes shared local storages if you passed in the same map. Thanks for the clarification! > I was talking specifically about the "if (gfp_flags == GFP_KERNEL)" > allocation code path. Thus, it needs a second inode local storage (i.e. > a second inode map) for the same inode. A second inode storage map > and another "bpf_inode_storage_get(&second_inode_storage_map, ... > BPF_LOCAL_STORAGE_GET_F_CREATE)" should be enough. > > It seems it needs a re-spin because of the sparse warning. > I don't see an issue from the code, just thinking it will > be useful to have a test to exercise this path. It > could be a follow up as an individual patch if not in v3. I will submit a v3 that fixes the sparse warning and adds a case to exercise this path. Thanks for reviewing this, Martin!