On Thu, Jul 23, 2020 at 11:47:40PM -0500, YiFei Zhu wrote: > From: YiFei Zhu <zhuyifei@xxxxxxxxxx> > > To access the storage in a CGROUP_STORAGE map, one uses > bpf_get_local_storage helper, which is extremely fast due to its > use of per-CPU variables. However, its whole code is built on > the assumption that one map can only be used by one program at any > time, and this prohibits any sharing of data between multiple > programs using these maps, eliminating a lot of use cases, such > as some per-cgroup configuration storage, written to by a > setsockopt program and read by a cg_sock_addr program. > > Why not use other map types? The great part of CGROUP_STORAGE map > is that it is isolated by different cgroups its attached to. When > one program uses bpf_get_local_storage, even on the same map, it > gets different storages if it were run as a result of attaching > to different cgroups. The kernel manages the storages, simplifying > BPF program or userspace. In theory, one could probably use other > maps like array or hash to do the same thing, but it would be a > major overhead / complexity. Userspace needs to know when a cgroup > is being freed in order to free up a space in the replacement map. > > This patch set introduces a significant change to the semantics of > CGROUP_STORAGE map type. Instead of each storage being tied to one > single attachment, it is shared across different attachments to > the same cgroup, and persists until either the map or the cgroup > attached to is being freed. > > User may use u64 as the key to the map, and the result would be > that the attach type become ignored during key comparison, and > programs of different attach types will share the same storage if > the cgroups they are attached to are the same. > > How could this break existing users? > * Users that uses detach & reattach / program replacement as a > shortcut to zeroing the storage. Since we need sharing between > programs, we cannot zero the storage. Users that expect this > behavior should either attach a program with a new map, or > explicitly zero the map with a syscall. > This case is dependent on undocumented implementation details, > so the impact should be very minimal. > > Patch 1 introduces a test on the old expected behavior of the map > type. > > Patch 2 introduces a test showing how two programs cannot share > one such map. > > Patch 3 implements the change of semantics to the map. > > Patch 4 amends the new test such that it yields the behavior we > expect from the change. > > Patch 5 documents the map type. > > Changes since RFC: > * Clarify commit message in patch 3 such that it says the lifetime > of the storage is ended at the freeing of the cgroup_bpf, rather > than the cgroup itself. > * Restored an -ENOMEM check in __cgroup_bpf_attach. > * Update selftests for recent change in network_helpers API. > > Changes since v1: > * s/CHECK_FAIL/CHECK/ > * s/bpf_prog_attach/bpf_program__attach_cgroup/ > * Moved test__start_subtest to test_cg_storage_multi. > * Removed some redundant CHECK_FAIL where they are already CHECK-ed. > > Changes since v2: > * Lock cgroup_mutex during map_free. > * Publish new storages only if attach is successful, by tracking > exactly which storages are reused in an array of bools. > * Mention bpftool map dump showing a value of zero for attach_type > in patch 3 commit message. > > Changes since v3: > * Use a much simpler lookup and allocate-if-not-exist from the fact > that cgroup_mutex is locked during attach. > * Removed an unnecessary spinlock hold. > > Changes since v4: > * Changed semantics so that if the key type is struct > bpf_cgroup_storage_key the map retains isolation between different > attach types. Sharing between different attach types only occur > when key type is u64. > * Adapted tests and docs for the above change. > > Changes since v5: > * Removed redundant NULL check before bpf_link__destroy. > * Free BPF object explicitly, after asserting that object failed to > load, in the event that the object did not fail to load. > * Rename variable in bpf_cgroup_storage_key_cmp for clarity. > * Added a lot of information to Documentation, more or less copied > from what Martin KaFai Lau wrote. > > YiFei Zhu (5): > selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches > selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs > bpf: Make cgroup storages shared between programs on the same cgroup > selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress > Documentation/bpf: Document CGROUP_STORAGE map type Acked-by: Martin KaFai Lau <kafai@xxxxxx>