On Wed, Oct 20, 2021 at 10:18:13AM -1000, Tejun Heo wrote: > From 5e3ad0d4a0b0732e7ebe035582d282ab752397ed Mon Sep 17 00:00:00 2001 > From: Tejun Heo <tj@xxxxxxxxxx> > Date: Wed, 20 Oct 2021 08:56:53 -1000 > > task_local_storage currently does not support pre-allocation and the memory > is allocated on demand using the GFP_ATOMIC mask. While atomic allocations > succeed most of the time and the occasional failures aren't a problem for > many use-cases, there are some which can benefit from reliable allocations - > e.g. tracking acquisitions and releases of specific resources to root cause > long-term reference leaks. > > Prealloc semantics for task_local_storage: > > * When a prealloc map is created, the map's elements for all existing tasks > are allocated. > > * Afterwards, whenever a new task is forked, it automatically allocates the > elements for the existing preallocated maps. > > To synchronize against concurrent forks, CONFIG_BPF_SYSCALL now enables > CONFIG_THREADGROUP_RWSEM and prealloc task_local_storage creation path > write-locks threadgroup_rwsem, and the rest of the implementation is > straight-forward. [ ... ] > +static int task_storage_map_populate(struct bpf_local_storage_map *smap) > +{ > + struct bpf_local_storage *storage = NULL; > + struct bpf_local_storage_elem *selem = NULL; > + struct task_struct *p, *g; > + int err = 0; > + > + lockdep_assert_held(&threadgroup_rwsem); > +retry: > + if (!storage) > + storage = bpf_map_kzalloc(&smap->map, sizeof(*storage), > + GFP_USER); > + if (!selem) > + selem = bpf_map_kzalloc(&smap->map, smap->elem_size, GFP_USER); > + if (!storage || !selem) { > + err = -ENOMEM; > + goto out_free; > + } > + > + rcu_read_lock(); > + bpf_task_storage_lock(); > + > + for_each_process_thread(g, p) { I am thinking if this loop can be done in bpf iter. If the bpf_local_storage_map is sleepable safe (not yet done but there is an earlier attempt [0]), bpf_local_storage_update() should be able to alloc without GFP_ATOMIC by sleepable bpf prog and this potentially will be useful in general for other sleepable use cases. For example, if a sleepable bpf iter prog can run in this loop (or the existing bpf task iter loop is as good?), the iter bpf prog can call bpf_task_storage_get(BPF_SK_STORAGE_GET_F_CREATE) on a sleepable bpf_local_storage_map. This pre-alloc then can be done similarly on the tcp/udp socket side by running a bpf prog at the existing bpf tcp/udp iter. [0]: https://lore.kernel.org/bpf/20210826235127.303505-1-kpsingh@xxxxxxxxxx/ > + struct bpf_local_storage_data *sdata; > + > + /* Try inserting with atomic allocations. On failure, retry with > + * the preallocated ones. > + */ > + sdata = bpf_local_storage_update(p, smap, NULL, BPF_NOEXIST); > + > + if (PTR_ERR(sdata) == -ENOMEM && storage && selem) { > + sdata = __bpf_local_storage_update(p, smap, NULL, > + BPF_NOEXIST, > + &storage, &selem); > + } > + > + /* Check -EEXIST before need_resched() to guarantee forward > + * progress. > + */ > + if (PTR_ERR(sdata) == -EEXIST) > + continue; > + > + /* If requested or alloc failed, take a breather and loop back > + * to preallocate. > + */ > + if (need_resched() || > + PTR_ERR(sdata) == -EAGAIN || PTR_ERR(sdata) == -ENOMEM) { > + bpf_task_storage_unlock(); > + rcu_read_unlock(); > + cond_resched(); > + goto retry; > + } > + > + if (IS_ERR(sdata)) { > + err = PTR_ERR(sdata); > + goto out_unlock; > + } > + } > +out_unlock: > + bpf_task_storage_unlock(); > + rcu_read_unlock(); > +out_free: > + if (storage) > + kfree(storage); > + if (selem) > + kfree(selem); > + return err; > +} [ ... ] > +int bpf_task_storage_fork(struct task_struct *task) > +{ > + struct bpf_local_storage_map *smap; > + > + percpu_rwsem_assert_held(&threadgroup_rwsem); > + > + list_for_each_entry(smap, &prealloc_smaps, prealloc_node) { Mostly a comment here from the networking side, I suspect the common use case is going to be more selective based on different protocol (tcp or udp) and even port. There is some existing bpf hooks during inet_sock creation time, bind time ...etc. The bpf prog can be selective on what bpf_sk_storage it needs by inspecting different fields of a sk. e.g. in inet_create(), there is BPF_CGROUP_RUN_PROG_INET_SOCK(sk). Would a similar hook be useful on the fork side? > + struct bpf_local_storage *storage; > + struct bpf_local_storage_elem *selem; > + struct bpf_local_storage_data *sdata; > + > + storage = bpf_map_kzalloc(&smap->map, sizeof(*storage), > + GFP_USER); > + selem = bpf_map_kzalloc(&smap->map, smap->elem_size, GFP_USER); > + > + rcu_read_lock(); > + bpf_task_storage_lock(); > + sdata = __bpf_local_storage_update(task, smap, NULL, BPF_NOEXIST, > + &storage, &selem); > + bpf_task_storage_unlock(); > + rcu_read_unlock(); > + > + if (storage) > + kfree(storage); > + if (selem) > + kfree(selem); > + > + if (IS_ERR(sdata)) { > + bpf_task_storage_free(task); > + return PTR_ERR(sdata); > + } > + } > + > + return 0; > +}