On Mon, Oct 17, 2022 at 12:25 PM Yonghong Song <yhs@xxxxxxxx> wrote: > > > > On 10/17/22 11:01 AM, sdf@xxxxxxxxxx wrote: > > On 10/13, Yonghong Song wrote: > >> Similar to sk/inode/task storage, implement similar cgroup local storage. > > > >> There already exists a local storage implementation for cgroup-attached > >> bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper > >> bpf_get_local_storage(). But there are use cases such that non-cgroup > >> attached bpf progs wants to access cgroup local storage data. For > >> example, > >> tc egress prog has access to sk and cgroup. It is possible to use > >> sk local storage to emulate cgroup local storage by storing data in > >> socket. > >> But this is a waste as it could be lots of sockets belonging to a > >> particular > >> cgroup. Alternatively, a separate map can be created with cgroup id as > >> the key. > >> But this will introduce additional overhead to manipulate the new map. > >> A cgroup local storage, similar to existing sk/inode/task storage, > >> should help for this use case. > > > >> The life-cycle of storage is managed with the life-cycle of the > >> cgroup struct. i.e. the storage is destroyed along with the owning > >> cgroup > >> with a callback to the bpf_cgroup_storage_free when cgroup itself > >> is deleted. > > > >> The userspace map operations can be done by using a cgroup fd as a key > >> passed to the lookup, update and delete operations. > > > > > > [..] > > > >> Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old > >> cgroup local > >> storage support, the new map name BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE is > >> used > >> for cgroup storage available to non-cgroup-attached bpf programs. The two > >> helpers are named as bpf_cgroup_local_storage_get() and > >> bpf_cgroup_local_storage_delete(). > > > > Have you considered doing something similar to 7d9c3427894f ("bpf: Make > > cgroup storages shared between programs on the same cgroup") where > > the map changes its behavior depending on the key size (see key_size checks > > in cgroup_storage_map_alloc)? Looks like sizeof(int) for fd still > > can be used so we can, in theory, reuse the name.. > > > > Pros: > > - no need for a new map name > > > > Cons: > > - existing BPF_MAP_TYPE_CGROUP_STORAGE is already messy; might be not a > > good idea to add more stuff to it? > > Thinking differently. I think I would have reuse the same map name > (BPF_MAP_TYPE_CGROUP_STORAGE) but with a flag like > BPF_F_LOCAL_STORAGE_GENERIC). > > We could use map_extra as well, but I think an explicit flag might be > better. Ack, flag and map_extra might work as well. They are more explicit, which is good/bad depending on who you talk to. I was assuming that we can just support the following: struct { __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); __type(key, int); __type(value, xxx); } ...; and depend on key_size == sizeof(int), but up to you; just trying to understand whether it makes sense to share the name or not. Sharing the helper probably not worth it given the special treatment? Or maybe it can be a shortcut to "lookup this map with my cgroup"? > > > > But, for the very least, should we also extend > > Documentation/bpf/map_cgroup_storage.rst to cover the new map? We've > > tried to keep some of the important details in there.. > > > >> Signed-off-by: Yonghong Song <yhs@xxxxxx> > >> --- > >> include/linux/bpf.h | 3 + > >> include/linux/bpf_types.h | 1 + > >> include/linux/cgroup-defs.h | 4 + > >> include/uapi/linux/bpf.h | 39 +++++ > >> kernel/bpf/Makefile | 2 +- > >> kernel/bpf/bpf_cgroup_storage.c | 280 ++++++++++++++++++++++++++++++++ > >> kernel/bpf/helpers.c | 6 + > >> kernel/bpf/syscall.c | 3 +- > >> kernel/bpf/verifier.c | 14 +- > >> kernel/cgroup/cgroup.c | 4 + > >> kernel/trace/bpf_trace.c | 4 + > >> scripts/bpf_doc.py | 2 + > >> tools/include/uapi/linux/bpf.h | 39 +++++ > >> 13 files changed, 398 insertions(+), 3 deletions(-) > >> create mode 100644 kernel/bpf/bpf_cgroup_storage.c > > > >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > >> index 9e7d46d16032..1395a01c7f18 100644 > >> --- a/include/linux/bpf.h > >> +++ b/include/linux/bpf.h > >> @@ -2045,6 +2045,7 @@ struct bpf_link *bpf_link_by_id(u32 id); > > > >> const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id > >> func_id); > >> void bpf_task_storage_free(struct task_struct *task); > >> +void bpf_local_cgroup_storage_free(struct cgroup *cgroup); > >> bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog); > >> const struct btf_func_model * > >> bpf_jit_find_kfunc_model(const struct bpf_prog *prog, > >> @@ -2537,6 +2538,8 @@ extern const struct bpf_func_proto > >> bpf_copy_from_user_task_proto; > >> extern const struct bpf_func_proto bpf_set_retval_proto; > >> extern const struct bpf_func_proto bpf_get_retval_proto; > >> extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto; > >> +extern const struct bpf_func_proto bpf_cgroup_storage_get_proto; > >> +extern const struct bpf_func_proto bpf_cgroup_storage_delete_proto; > > > >> const struct bpf_func_proto *tracing_prog_func_proto( > >> enum bpf_func_id func_id, const struct bpf_prog *prog); > >> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h > >> index 2c6a4f2562a7..7a0362d7a0aa 100644 > >> --- a/include/linux/bpf_types.h > >> +++ b/include/linux/bpf_types.h > >> @@ -90,6 +90,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, > >> cgroup_array_map_ops) > >> #ifdef CONFIG_CGROUP_BPF > >> BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops) > >> BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, > >> cgroup_storage_map_ops) > >> +BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE, > >> cgroup_local_storage_map_ops) > >> #endif > >> BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops) > >> BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops) > >> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h > >> index 4bcf56b3491c..c6f4590dda68 100644 > >> --- a/include/linux/cgroup-defs.h > >> +++ b/include/linux/cgroup-defs.h > >> @@ -504,6 +504,10 @@ struct cgroup { > >> /* Used to store internal freezer state */ > >> struct cgroup_freezer_state freezer; > > > >> +#ifdef CONFIG_BPF_SYSCALL > >> + struct bpf_local_storage __rcu *bpf_cgroup_storage; > >> +#endif > >> + > >> /* ids of the ancestors at each level including self */ > >> u64 ancestor_ids[]; > >> }; > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index 17f61338f8f8..d918b4054297 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -935,6 +935,7 @@ enum bpf_map_type { > >> BPF_MAP_TYPE_TASK_STORAGE, > >> BPF_MAP_TYPE_BLOOM_FILTER, > >> BPF_MAP_TYPE_USER_RINGBUF, > >> + BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE, > >> }; > > > >> /* Note that tracing related programs such as > >> @@ -5435,6 +5436,42 @@ union bpf_attr { > >> * **-E2BIG** if user-space has tried to publish a sample > >> which is > >> * larger than the size of the ring buffer, or which cannot fit > >> * within a struct bpf_dynptr. > >> + * > >> + * void *bpf_cgroup_local_storage_get(struct bpf_map *map, struct > >> cgroup *cgroup, void *value, u64 flags) > >> + * Description > >> + * Get a bpf_local_storage from the *cgroup*. > >> + * > >> + * Logically, it could be thought of as getting the value from > >> + * a *map* with *cgroup* as the **key**. From this > >> + * perspective, the usage is not much different from > >> + * **bpf_map_lookup_elem**\ (*map*, **&**\ *cgroup*) except this > >> + * helper enforces the key must be a cgroup struct and the map > >> must also > >> + * be a **BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE**. > >> + * > >> + * Underneath, the value is stored locally at *cgroup* instead of > >> + * the *map*. The *map* is used as the bpf-local-storage > >> + * "type". The bpf-local-storage "type" (i.e. the *map*) is > >> + * searched against all bpf_local_storage residing at *cgroup*. > >> + * > >> + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) > >> can be > >> + * used such that a new bpf_local_storage will be > >> + * created if one does not exist. *value* can be used > >> + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify > >> + * the initial value of a bpf_local_storage. If *value* is > >> + * **NULL**, the new bpf_local_storage will be zero initialized. > >> + * Return > >> + * A bpf_local_storage pointer is returned on success. > >> + * > >> + * **NULL** if not found or there was an error in adding > >> + * a new bpf_local_storage. > >> + * > >> + * long bpf_cgroup_local_storage_delete(struct bpf_map *map, struct > >> cgroup *cgroup) > >> + * Description > >> + * Delete a bpf_local_storage from a *cgroup*. > >> + * Return > >> + * 0 on success. > >> + * > >> + * **-ENOENT** if the bpf_local_storage cannot be found. > >> */ > >> #define ___BPF_FUNC_MAPPER(FN, ctx...) \ > >> FN(unspec, 0, ##ctx) \ > >> @@ -5647,6 +5684,8 @@ union bpf_attr { > >> FN(tcp_raw_check_syncookie_ipv6, 207, ##ctx) \ > >> FN(ktime_get_tai_ns, 208, ##ctx) \ > >> FN(user_ringbuf_drain, 209, ##ctx) \ > >> + FN(cgroup_local_storage_get, 210, ##ctx) \ > >> + FN(cgroup_local_storage_delete, 211, ##ctx) \ > >> /* */ > > > >> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER > >> that don't > >> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile > >> index 341c94f208f4..b02693f51978 100644 > >> --- a/kernel/bpf/Makefile > >> +++ b/kernel/bpf/Makefile > >> @@ -25,7 +25,7 @@ ifeq ($(CONFIG_PERF_EVENTS),y) > >> obj-$(CONFIG_BPF_SYSCALL) += stackmap.o > >> endif > >> ifeq ($(CONFIG_CGROUPS),y) > >> -obj-$(CONFIG_BPF_SYSCALL) += cgroup_iter.o > >> +obj-$(CONFIG_BPF_SYSCALL) += cgroup_iter.o bpf_cgroup_storage.o > >> endif > >> obj-$(CONFIG_CGROUP_BPF) += cgroup.o > >> ifeq ($(CONFIG_INET),y) > >> diff --git a/kernel/bpf/bpf_cgroup_storage.c > >> b/kernel/bpf/bpf_cgroup_storage.c > >> new file mode 100644 > >> index 000000000000..9974784822da > >> --- /dev/null > >> +++ b/kernel/bpf/bpf_cgroup_storage.c > >> @@ -0,0 +1,280 @@ > >> +// SPDX-License-Identifier: GPL-2.0 > >> +/* > >> + * Copyright (c) 2022 Meta Platforms, Inc. and affiliates. > >> + */ > >> + > >> +#include <linux/types.h> > >> +#include <linux/bpf.h> > >> +#include <linux/bpf_local_storage.h> > >> +#include <uapi/linux/btf.h> > >> +#include <linux/btf_ids.h> > >> + > >> +DEFINE_BPF_STORAGE_CACHE(cgroup_cache); > >> + > >> +static DEFINE_PER_CPU(int, bpf_cgroup_storage_busy); > >> + > >> +static void bpf_cgroup_storage_lock(void) > >> +{ > >> + migrate_disable(); > >> + this_cpu_inc(bpf_cgroup_storage_busy); > >> +} > >> + > >> +static void bpf_cgroup_storage_unlock(void) > >> +{ > >> + this_cpu_dec(bpf_cgroup_storage_busy); > >> + migrate_enable(); > >> +} > >> + > >> +static bool bpf_cgroup_storage_trylock(void) > >> +{ > >> + migrate_disable(); > >> + if (unlikely(this_cpu_inc_return(bpf_cgroup_storage_busy) != 1)) { > >> + this_cpu_dec(bpf_cgroup_storage_busy); > >> + migrate_enable(); > >> + return false; > >> + } > >> + return true; > >> +} > > > > Task storage has lock/unlock/trylock; inode storage doesn't; why does > > cgroup need it as well? > > > >> +static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) > >> +{ > >> + struct cgroup *cg = owner; > >> + > >> + return &cg->bpf_cgroup_storage; > >> +} > >> + > >> +void bpf_local_cgroup_storage_free(struct cgroup *cgroup) > >> +{ > >> + struct bpf_local_storage *local_storage; > >> + struct bpf_local_storage_elem *selem; > >> + bool free_cgroup_storage = false; > >> + struct hlist_node *n; > >> + unsigned long flags; > >> + > >> + rcu_read_lock(); > >> + local_storage = rcu_dereference(cgroup->bpf_cgroup_storage); > >> + if (!local_storage) { > >> + rcu_read_unlock(); > >> + return; > >> + } > >> + > >> + /* Neither the bpf_prog nor the bpf-map's syscall > >> + * could be modifying the local_storage->list now. > >> + * Thus, no elem can be added-to or deleted-from the > >> + * local_storage->list by the bpf_prog or by the bpf-map's syscall. > >> + * > >> + * It is racing with bpf_local_storage_map_free() alone > >> + * when unlinking elem from the local_storage->list and > >> + * the map's bucket->list. > >> + */ > >> + bpf_cgroup_storage_lock(); > >> + raw_spin_lock_irqsave(&local_storage->lock, flags); > >> + hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) { > >> + bpf_selem_unlink_map(selem); > >> + free_cgroup_storage = > >> + bpf_selem_unlink_storage_nolock(local_storage, selem, > >> false, false); > >> + } > >> + raw_spin_unlock_irqrestore(&local_storage->lock, flags); > >> + bpf_cgroup_storage_unlock(); > >> + rcu_read_unlock(); > >> + > >> + /* free_cgroup_storage should always be true as long as > >> + * local_storage->list was non-empty. > >> + */ > >> + if (free_cgroup_storage) > >> + kfree_rcu(local_storage, rcu); > >> +} > > > >> +static struct bpf_local_storage_data * > >> +cgroup_storage_lookup(struct cgroup *cgroup, struct bpf_map *map, > >> bool cacheit_lockit) > >> +{ > >> + struct bpf_local_storage *cgroup_storage; > >> + struct bpf_local_storage_map *smap; > >> + > >> + cgroup_storage = rcu_dereference_check(cgroup->bpf_cgroup_storage, > >> + bpf_rcu_lock_held()); > >> + if (!cgroup_storage) > >> + return NULL; > >> + > >> + smap = (struct bpf_local_storage_map *)map; > >> + return bpf_local_storage_lookup(cgroup_storage, smap, > >> cacheit_lockit); > >> +} > >> + > >> +static void *bpf_cgroup_storage_lookup_elem(struct bpf_map *map, void > >> *key) > >> +{ > >> + struct bpf_local_storage_data *sdata; > >> + struct cgroup *cgroup; > >> + int fd; > >> + > >> + fd = *(int *)key; > >> + cgroup = cgroup_get_from_fd(fd); > >> + if (IS_ERR(cgroup)) > >> + return ERR_CAST(cgroup); > >> + > >> + bpf_cgroup_storage_lock(); > >> + sdata = cgroup_storage_lookup(cgroup, map, true); > >> + bpf_cgroup_storage_unlock(); > >> + cgroup_put(cgroup); > >> + return sdata ? sdata->data : NULL; > >> +} > > > > A lot of the above (free/lookup) seems to be copy-pasted from the task > > storage; > > any point in trying to generalize the common parts? > > > >> +static int bpf_cgroup_storage_update_elem(struct bpf_map *map, void > >> *key, > >> + void *value, u64 map_flags) > >> +{ > >> + struct bpf_local_storage_data *sdata; > >> + struct cgroup *cgroup; > >> + int err, fd; > >> + > >> + fd = *(int *)key; > >> + cgroup = cgroup_get_from_fd(fd); > >> + if (IS_ERR(cgroup)) > >> + return PTR_ERR(cgroup); > >> + > >> + bpf_cgroup_storage_lock(); > >> + sdata = bpf_local_storage_update(cgroup, (struct > >> bpf_local_storage_map *)map, > >> + value, map_flags, GFP_ATOMIC); > >> + bpf_cgroup_storage_unlock(); > >> + err = PTR_ERR_OR_ZERO(sdata); > >> + cgroup_put(cgroup); > >> + return err; > >> +} > >> + > >> +static int cgroup_storage_delete(struct cgroup *cgroup, struct > >> bpf_map *map) > >> +{ > >> + struct bpf_local_storage_data *sdata; > >> + > >> + sdata = cgroup_storage_lookup(cgroup, map, false); > >> + if (!sdata) > >> + return -ENOENT; > >> + > >> + bpf_selem_unlink(SELEM(sdata), true); > >> + return 0; > >> +} > >> + > >> +static int bpf_cgroup_storage_delete_elem(struct bpf_map *map, void > >> *key) > >> +{ > >> + struct cgroup *cgroup; > >> + int err, fd; > >> + > >> + fd = *(int *)key; > >> + cgroup = cgroup_get_from_fd(fd); > >> + if (IS_ERR(cgroup)) > >> + return PTR_ERR(cgroup); > >> + > >> + bpf_cgroup_storage_lock(); > >> + err = cgroup_storage_delete(cgroup, map); > >> + bpf_cgroup_storage_unlock(); > >> + if (err) > >> + return err; > >> + > >> + cgroup_put(cgroup); > >> + return 0; > >> +} > >> + > >> +static int notsupp_get_next_key(struct bpf_map *map, void *key, void > >> *next_key) > >> +{ > >> + return -ENOTSUPP; > >> +} > >> + > >> +static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr) > >> +{ > >> + struct bpf_local_storage_map *smap; > >> + > >> + smap = bpf_local_storage_map_alloc(attr); > >> + if (IS_ERR(smap)) > >> + return ERR_CAST(smap); > >> + > >> + smap->cache_idx = bpf_local_storage_cache_idx_get(&cgroup_cache); > >> + return &smap->map; > >> +} > >> + > >> +static void cgroup_storage_map_free(struct bpf_map *map) > >> +{ > >> + struct bpf_local_storage_map *smap; > >> + > >> + smap = (struct bpf_local_storage_map *)map; > >> + bpf_local_storage_cache_idx_free(&cgroup_cache, smap->cache_idx); > >> + bpf_local_storage_map_free(smap, NULL); > >> +} > >> + > >> +/* *gfp_flags* is a hidden argument provided by the verifier */ > >> +BPF_CALL_5(bpf_cgroup_storage_get, struct bpf_map *, map, struct > >> cgroup *, cgroup, > >> + void *, value, u64, flags, gfp_t, gfp_flags) > >> +{ > >> + struct bpf_local_storage_data *sdata; > >> + > >> + WARN_ON_ONCE(!bpf_rcu_lock_held()); > >> + if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE)) > >> + return (unsigned long)NULL; > >> + > >> + if (!cgroup) > >> + return (unsigned long)NULL; > >> + > >> + if (!bpf_cgroup_storage_trylock()) > >> + return (unsigned long)NULL; > >> + > >> + sdata = cgroup_storage_lookup(cgroup, map, true); > >> + if (sdata) > >> + goto unlock; > >> + > >> + /* only allocate new storage, when the cgroup is refcounted */ > >> + if (!percpu_ref_is_dying(&cgroup->self.refcnt) && > >> + (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) > >> + sdata = bpf_local_storage_update(cgroup, (struct > >> bpf_local_storage_map *)map, > >> + value, BPF_NOEXIST, gfp_flags); > >> + > >> +unlock: > >> + bpf_cgroup_storage_unlock(); > >> + return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL : (unsigned > >> long)sdata->data; > >> +} > >> + > >> +BPF_CALL_2(bpf_cgroup_storage_delete, struct bpf_map *, map, struct > >> cgroup *, cgroup) > >> +{ > >> + int ret; > >> + > >> + WARN_ON_ONCE(!bpf_rcu_lock_held()); > >> + if (!cgroup) > >> + return -EINVAL; > >> + > >> + if (!bpf_cgroup_storage_trylock()) > >> + return -EBUSY; > >> + > >> + ret = cgroup_storage_delete(cgroup, map); > >> + bpf_cgroup_storage_unlock(); > >> + return ret; > >> +} > >> + > >> +BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct, > >> bpf_local_storage_map) > >> +const struct bpf_map_ops cgroup_local_storage_map_ops = { > >> + .map_meta_equal = bpf_map_meta_equal, > >> + .map_alloc_check = bpf_local_storage_map_alloc_check, > >> + .map_alloc = cgroup_storage_map_alloc, > >> + .map_free = cgroup_storage_map_free, > >> + .map_get_next_key = notsupp_get_next_key, > >> + .map_lookup_elem = bpf_cgroup_storage_lookup_elem, > >> + .map_update_elem = bpf_cgroup_storage_update_elem, > >> + .map_delete_elem = bpf_cgroup_storage_delete_elem, > >> + .map_check_btf = bpf_local_storage_map_check_btf, > >> + .map_btf_id = &cgroup_storage_map_btf_ids[0], > >> + .map_owner_storage_ptr = cgroup_storage_ptr, > >> +}; > >> + > >> +const struct bpf_func_proto bpf_cgroup_storage_get_proto = { > >> + .func = bpf_cgroup_storage_get, > >> + .gpl_only = false, > >> + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, > >> + .arg1_type = ARG_CONST_MAP_PTR, > >> + .arg2_type = ARG_PTR_TO_BTF_ID, > >> + .arg2_btf_id = &bpf_cgroup_btf_id[0], > >> + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, > >> + .arg4_type = ARG_ANYTHING, > >> +}; > >> + > >> +const struct bpf_func_proto bpf_cgroup_storage_delete_proto = { > >> + .func = bpf_cgroup_storage_delete, > >> + .gpl_only = false, > >> + .ret_type = RET_INTEGER, > >> + .arg1_type = ARG_CONST_MAP_PTR, > >> + .arg2_type = ARG_PTR_TO_BTF_ID, > >> + .arg2_btf_id = &bpf_cgroup_btf_id[0], > >> +}; > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >> index a6b04faed282..5c5bb08832ec 100644 > >> --- a/kernel/bpf/helpers.c > >> +++ b/kernel/bpf/helpers.c > >> @@ -1663,6 +1663,12 @@ bpf_base_func_proto(enum bpf_func_id func_id) > >> return &bpf_dynptr_write_proto; > >> case BPF_FUNC_dynptr_data: > >> return &bpf_dynptr_data_proto; > >> +#ifdef CONFIG_CGROUPS > >> + case BPF_FUNC_cgroup_local_storage_get: > >> + return &bpf_cgroup_storage_get_proto; > >> + case BPF_FUNC_cgroup_local_storage_delete: > >> + return &bpf_cgroup_storage_delete_proto; > >> +#endif > >> default: > >> break; > >> } > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > >> index 7b373a5e861f..e53c7fae6e22 100644 > >> --- a/kernel/bpf/syscall.c > >> +++ b/kernel/bpf/syscall.c > >> @@ -1016,7 +1016,8 @@ static int map_check_btf(struct bpf_map *map, > >> const struct btf *btf, > >> map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && > >> map->map_type != BPF_MAP_TYPE_SK_STORAGE && > >> map->map_type != BPF_MAP_TYPE_INODE_STORAGE && > >> - map->map_type != BPF_MAP_TYPE_TASK_STORAGE) > >> + map->map_type != BPF_MAP_TYPE_TASK_STORAGE && > >> + map->map_type != BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE) > >> return -ENOTSUPP; > >> if (map->spin_lock_off + sizeof(struct bpf_spin_lock) > > >> map->value_size) { > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 6f6d2d511c06..f36f6a3c0d50 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -6360,6 +6360,11 @@ static int check_map_func_compatibility(struct > >> bpf_verifier_env *env, > >> func_id != BPF_FUNC_task_storage_delete) > >> goto error; > >> break; > >> + case BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE: > >> + if (func_id != BPF_FUNC_cgroup_local_storage_get && > >> + func_id != BPF_FUNC_cgroup_local_storage_delete) > >> + goto error; > >> + break; > >> case BPF_MAP_TYPE_BLOOM_FILTER: > >> if (func_id != BPF_FUNC_map_peek_elem && > >> func_id != BPF_FUNC_map_push_elem) > >> @@ -6472,6 +6477,11 @@ static int check_map_func_compatibility(struct > >> bpf_verifier_env *env, > >> if (map->map_type != BPF_MAP_TYPE_TASK_STORAGE) > >> goto error; > >> break; > >> + case BPF_FUNC_cgroup_local_storage_get: > >> + case BPF_FUNC_cgroup_local_storage_delete: > >> + if (map->map_type != BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE) > >> + goto error; > >> + break; > >> default: > >> break; > >> } > >> @@ -12713,6 +12723,7 @@ static int check_map_prog_compatibility(struct > >> bpf_verifier_env *env, > >> case BPF_MAP_TYPE_INODE_STORAGE: > >> case BPF_MAP_TYPE_SK_STORAGE: > >> case BPF_MAP_TYPE_TASK_STORAGE: > >> + case BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE: > >> break; > >> default: > >> verbose(env, > >> @@ -14149,7 +14160,8 @@ static int do_misc_fixups(struct > >> bpf_verifier_env *env) > > > >> if (insn->imm == BPF_FUNC_task_storage_get || > >> insn->imm == BPF_FUNC_sk_storage_get || > >> - insn->imm == BPF_FUNC_inode_storage_get) { > >> + insn->imm == BPF_FUNC_inode_storage_get || > >> + insn->imm == BPF_FUNC_cgroup_local_storage_get) { > >> if (env->prog->aux->sleepable) > >> insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force > >> __s32)GFP_KERNEL); > >> else > >> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > >> index 8ad2c267ff47..2fa2c950c7fb 100644 > >> --- a/kernel/cgroup/cgroup.c > >> +++ b/kernel/cgroup/cgroup.c > >> @@ -985,6 +985,10 @@ void put_css_set_locked(struct css_set *cset) > >> put_css_set_locked(cset->dom_cset); > >> } > > > >> +#ifdef CONFIG_BPF_SYSCALL > >> + bpf_local_cgroup_storage_free(cset->dfl_cgrp); > >> +#endif > >> + > >> kfree_rcu(cset, rcu_head); > >> } > > > >> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > >> index 688552df95ca..179adaae4a9f 100644 > >> --- a/kernel/trace/bpf_trace.c > >> +++ b/kernel/trace/bpf_trace.c > >> @@ -1454,6 +1454,10 @@ bpf_tracing_func_proto(enum bpf_func_id > >> func_id, const struct bpf_prog *prog) > >> return &bpf_get_current_cgroup_id_proto; > >> case BPF_FUNC_get_current_ancestor_cgroup_id: > >> return &bpf_get_current_ancestor_cgroup_id_proto; > >> + case BPF_FUNC_cgroup_local_storage_get: > >> + return &bpf_cgroup_storage_get_proto; > >> + case BPF_FUNC_cgroup_local_storage_delete: > >> + return &bpf_cgroup_storage_delete_proto; > >> #endif > >> case BPF_FUNC_send_signal: > >> return &bpf_send_signal_proto; > >> diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py > >> index c0e6690be82a..fdb0aff8cb5a 100755 > >> --- a/scripts/bpf_doc.py > >> +++ b/scripts/bpf_doc.py > >> @@ -685,6 +685,7 @@ class PrinterHelpers(Printer): > >> 'struct udp6_sock', > >> 'struct unix_sock', > >> 'struct task_struct', > >> + 'struct cgroup', > > > >> 'struct __sk_buff', > >> 'struct sk_msg_md', > >> @@ -742,6 +743,7 @@ class PrinterHelpers(Printer): > >> 'struct udp6_sock', > >> 'struct unix_sock', > >> 'struct task_struct', > >> + 'struct cgroup', > >> 'struct path', > >> 'struct btf_ptr', > >> 'struct inode', > >> diff --git a/tools/include/uapi/linux/bpf.h > >> b/tools/include/uapi/linux/bpf.h > >> index 17f61338f8f8..d918b4054297 100644 > >> --- a/tools/include/uapi/linux/bpf.h > >> +++ b/tools/include/uapi/linux/bpf.h > >> @@ -935,6 +935,7 @@ enum bpf_map_type { > >> BPF_MAP_TYPE_TASK_STORAGE, > >> BPF_MAP_TYPE_BLOOM_FILTER, > >> BPF_MAP_TYPE_USER_RINGBUF, > >> + BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE, > >> }; > > > >> /* Note that tracing related programs such as > >> @@ -5435,6 +5436,42 @@ union bpf_attr { > >> * **-E2BIG** if user-space has tried to publish a sample > >> which is > >> * larger than the size of the ring buffer, or which cannot fit > >> * within a struct bpf_dynptr. > >> + * > >> + * void *bpf_cgroup_local_storage_get(struct bpf_map *map, struct > >> cgroup *cgroup, void *value, u64 flags) > >> + * Description > >> + * Get a bpf_local_storage from the *cgroup*. > >> + * > >> + * Logically, it could be thought of as getting the value from > >> + * a *map* with *cgroup* as the **key**. From this > >> + * perspective, the usage is not much different from > >> + * **bpf_map_lookup_elem**\ (*map*, **&**\ *cgroup*) except this > >> + * helper enforces the key must be a cgroup struct and the map > >> must also > >> + * be a **BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE**. > >> + * > >> + * Underneath, the value is stored locally at *cgroup* instead of > >> + * the *map*. The *map* is used as the bpf-local-storage > >> + * "type". The bpf-local-storage "type" (i.e. the *map*) is > >> + * searched against all bpf_local_storage residing at *cgroup*. > >> + * > >> + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) > >> can be > >> + * used such that a new bpf_local_storage will be > >> + * created if one does not exist. *value* can be used > >> + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify > >> + * the initial value of a bpf_local_storage. If *value* is > >> + * **NULL**, the new bpf_local_storage will be zero initialized. > >> + * Return > >> + * A bpf_local_storage pointer is returned on success. > >> + * > >> + * **NULL** if not found or there was an error in adding > >> + * a new bpf_local_storage. > >> + * > >> + * long bpf_cgroup_local_storage_delete(struct bpf_map *map, struct > >> cgroup *cgroup) > >> + * Description > >> + * Delete a bpf_local_storage from a *cgroup*. > >> + * Return > >> + * 0 on success. > >> + * > >> + * **-ENOENT** if the bpf_local_storage cannot be found. > >> */ > >> #define ___BPF_FUNC_MAPPER(FN, ctx...) \ > >> FN(unspec, 0, ##ctx) \ > >> @@ -5647,6 +5684,8 @@ union bpf_attr { > >> FN(tcp_raw_check_syncookie_ipv6, 207, ##ctx) \ > >> FN(ktime_get_tai_ns, 208, ##ctx) \ > >> FN(user_ringbuf_drain, 209, ##ctx) \ > >> + FN(cgroup_local_storage_get, 210, ##ctx) \ > >> + FN(cgroup_local_storage_delete, 211, ##ctx) \ > >> /* */ > > > >> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER > >> that don't > >> -- > >> 2.30.2 > >