On Mon, Oct 17, 2022 at 1:15 PM Yonghong Song <yhs@xxxxxxxx> wrote: > > > > On 10/17/22 11:47 AM, Yosry Ahmed wrote: > > On Mon, Oct 17, 2022 at 11:43 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > >> > >> On Mon, Oct 17, 2022 at 11:26 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > >>> > >>> On Mon, Oct 17, 2022 at 11:02 AM <sdf@xxxxxxxxxx> wrote: > >>>> > >>>> On 10/13, Yonghong Song wrote: > >>>>> Similar to sk/inode/task storage, implement similar cgroup local storage. > >>>> > >>>>> There already exists a local storage implementation for cgroup-attached > >>>>> bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper > >>>>> bpf_get_local_storage(). But there are use cases such that non-cgroup > >>>>> attached bpf progs wants to access cgroup local storage data. For example, > >>>>> tc egress prog has access to sk and cgroup. It is possible to use > >>>>> sk local storage to emulate cgroup local storage by storing data in > >>>>> socket. > >>>>> But this is a waste as it could be lots of sockets belonging to a > >>>>> particular > >>>>> cgroup. Alternatively, a separate map can be created with cgroup id as > >>>>> the key. > >>>>> But this will introduce additional overhead to manipulate the new map. > >>>>> A cgroup local storage, similar to existing sk/inode/task storage, > >>>>> should help for this use case. > >>>> > >>>>> The life-cycle of storage is managed with the life-cycle of the > >>>>> cgroup struct. i.e. the storage is destroyed along with the owning cgroup > >>>>> with a callback to the bpf_cgroup_storage_free when cgroup itself > >>>>> is deleted. > >>>> > >>>>> The userspace map operations can be done by using a cgroup fd as a key > >>>>> passed to the lookup, update and delete operations. > >>>> > >>>> > >>>> [..] > >>>> > >>>>> Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup > >>>>> local > >>>>> storage support, the new map name BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE is > >>>>> used > >>>>> for cgroup storage available to non-cgroup-attached bpf programs. The two > >>>>> helpers are named as bpf_cgroup_local_storage_get() and > >>>>> bpf_cgroup_local_storage_delete(). > >>>> > >>>> Have you considered doing something similar to 7d9c3427894f ("bpf: Make > >>>> cgroup storages shared between programs on the same cgroup") where > >>>> the map changes its behavior depending on the key size (see key_size checks > >>>> in cgroup_storage_map_alloc)? Looks like sizeof(int) for fd still > >>>> can be used so we can, in theory, reuse the name.. > >>>> > >>>> Pros: > >>>> - no need for a new map name > >>>> > >>>> Cons: > >>>> - existing BPF_MAP_TYPE_CGROUP_STORAGE is already messy; might be not a > >>>> good idea to add more stuff to it? > >>>> > >>>> But, for the very least, should we also extend > >>>> Documentation/bpf/map_cgroup_storage.rst to cover the new map? We've > >>>> tried to keep some of the important details in there.. > >>> > >>> This might be a long shot, but is it possible to switch completely to > >>> this new generic cgroup storage, and for programs that attach to > >>> cgroups we can still do lookups/allocations during attachment like we > >>> do today? IOW, maintain the current API for cgroup progs but switch it > >>> to use this new map type instead. > >>> > >>> It feels like this map type is more generic and can be a superset of > >>> the existing cgroup storage, but I feel like I am missing something. > >> > >> I feel like the biggest issue is that the existing > >> bpf_get_local_storage helper is guaranteed to always return non-null > >> and the verifier doesn't require the programs to do null checks on it; > >> the new helper might return NULL making all existing programs fail the > >> verifier. > > > > What I meant is, keep the old bpf_get_local_storage helper only for > > cgroup-attached programs like we have today, and add a new generic > > bpf_cgroup_local_storage_get() helper. > > > > For cgroup-attached programs, make sure a cgroup storage entry is > > allocated and hooked to the helper on program attach time, to keep > > today's behavior constant. > > > > For other programs, the bpf_cgroup_local_storage_get() will do the > > normal lookup and allocate if necessary. > > > > Does this make any sense to you? > > Right. This is what I plan to do. The map will add a flag to > distinguish the old and new behavior. > This might not make any sense, but is this doable without a flag? Basically extend the new map type so that it has some special behaviors for cgroup attached programs (allocate memory on program attach, bpf_get_local_storage() automatically gets entry for the attached cgroup, etc). > > > >> > >> There might be something else I don't remember at this point (besides > >> that weird per-prog_type that we'd have to emulate as well).. > > > > Yeah there are things that will need to be emulated, but I feel like > > we may end up with less confusing code (and less code in general). > > > >> > >>>> > >>>>> Signed-off-by: Yonghong Song <yhs@xxxxxx> > >>>>> --- > >>>>> include/linux/bpf.h | 3 + > >>>>> include/linux/bpf_types.h | 1 + > >>>>> include/linux/cgroup-defs.h | 4 + > >>>>> include/uapi/linux/bpf.h | 39 +++++ > >>>>> kernel/bpf/Makefile | 2 +- > >>>>> kernel/bpf/bpf_cgroup_storage.c | 280 ++++++++++++++++++++++++++++++++ > >>>>> kernel/bpf/helpers.c | 6 + > >>>>> kernel/bpf/syscall.c | 3 +- > >>>>> kernel/bpf/verifier.c | 14 +- > >>>>> kernel/cgroup/cgroup.c | 4 + > >>>>> kernel/trace/bpf_trace.c | 4 + > >>>>> scripts/bpf_doc.py | 2 + > >>>>> tools/include/uapi/linux/bpf.h | 39 +++++ > >>>>> 13 files changed, 398 insertions(+), 3 deletions(-) > >>>>> create mode 100644 kernel/bpf/bpf_cgroup_storage.c > [...]