On Sun, Oct 23, 2022 at 11:05:30AM -0700, Yonghong Song wrote: > Similar to sk/inode/task storage, implement similar cgroup local storage. > > There already exists a local storage implementation for cgroup-attached > bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper > bpf_get_local_storage(). But there are use cases such that non-cgroup > attached bpf progs wants to access cgroup local storage data. For example, > tc egress prog has access to sk and cgroup. It is possible to use > sk local storage to emulate cgroup local storage by storing data in socket. > But this is a waste as it could be lots of sockets belonging to a particular > cgroup. Alternatively, a separate map can be created with cgroup id as the key. > But this will introduce additional overhead to manipulate the new map. > A cgroup local storage, similar to existing sk/inode/task storage, > should help for this use case. > > The life-cycle of storage is managed with the life-cycle of the > cgroup struct. i.e. the storage is destroyed along with the owning cgroup > with a call to bpf_cgrp_storage_free() when cgroup itself > is deleted. > > The userspace map operations can be done by using a cgroup fd as a key > passed to the lookup, update and delete operations. > > Typically, the following code is used to get the current cgroup: > struct task_struct *task = bpf_get_current_task_btf(); > ... task->cgroups->dfl_cgrp ... > and in structure task_struct definition: > struct task_struct { > .... > struct css_set __rcu *cgroups; > .... > } > With sleepable program, accessing task->cgroups is not protected by rcu_read_lock. > So the current implementation only supports non-sleepable program and supporting > sleepable program will be the next step together with adding rcu_read_lock > protection for rcu tagged structures. > > Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local > storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used > for cgroup storage available to non-cgroup-attached bpf programs. The old > cgroup storage supports bpf_get_local_storage() helper to get the cgroup data. > The new cgroup storage helper bpf_cgrp_storage_get() can provide similar > functionality. While old cgroup storage pre-allocates storage memory, the new > mechanism can also pre-allocate with a user space bpf_map_update_elem() call > to avoid potential run-time memory allocation failure. > Therefore, the new cgroup storage can provide all functionality w.r.t. > the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to > BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can > be deprecated since the new one can provide the same functionality. > > Signed-off-by: Yonghong Song <yhs@xxxxxx> > --- Looks great, thanks. Only other thing I'll mention is that I think Tejun had pointed out in [0] that cg was usually more used as an abbreviation. I don't feel strongly one way or the other, so here's my ack either way. [0]: https://lore.kernel.org/all/Y1NByH+suY2s65Kh@xxxxxxxxxxxxxxx/ Acked-by: David Vernet <void@xxxxxxxxxxxxx>