Re: [PATCH v6 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once

Martin KaFai Lau <kafai@xxxxxx> · Thu, 23 Jul 2020 22:39:54 -0700



On Thu, Jul 23, 2020 at 11:47:40PM -0500, YiFei Zhu wrote:
> From: YiFei Zhu <zhuyifei@xxxxxxxxxx>
> 
> To access the storage in a CGROUP_STORAGE map, one uses
> bpf_get_local_storage helper, which is extremely fast due to its
> use of per-CPU variables. However, its whole code is built on
> the assumption that one map can only be used by one program at any
> time, and this prohibits any sharing of data between multiple
> programs using these maps, eliminating a lot of use cases, such
> as some per-cgroup configuration storage, written to by a
> setsockopt program and read by a cg_sock_addr program.
> 
> Why not use other map types? The great part of CGROUP_STORAGE map
> is that it is isolated by different cgroups its attached to. When
> one program uses bpf_get_local_storage, even on the same map, it
> gets different storages if it were run as a result of attaching
> to different cgroups. The kernel manages the storages, simplifying
> BPF program or userspace. In theory, one could probably use other
> maps like array or hash to do the same thing, but it would be a
> major overhead / complexity. Userspace needs to know when a cgroup
> is being freed in order to free up a space in the replacement map.
> 
> This patch set introduces a significant change to the semantics of
> CGROUP_STORAGE map type. Instead of each storage being tied to one
> single attachment, it is shared across different attachments to
> the same cgroup, and persists until either the map or the cgroup
> attached to is being freed.
> 
> User may use u64 as the key to the map, and the result would be
> that the attach type become ignored during key comparison, and
> programs of different attach types will share the same storage if
> the cgroups they are attached to are the same.
> 
> How could this break existing users?
> * Users that uses detach & reattach / program replacement as a
>   shortcut to zeroing the storage. Since we need sharing between
>   programs, we cannot zero the storage. Users that expect this
>   behavior should either attach a program with a new map, or
>   explicitly zero the map with a syscall.
> This case is dependent on undocumented implementation details, 
> so the impact should be very minimal.
> 
> Patch 1 introduces a test on the old expected behavior of the map
> type.
> 
> Patch 2 introduces a test showing how two programs cannot share
> one such map.
> 
> Patch 3 implements the change of semantics to the map.
> 
> Patch 4 amends the new test such that it yields the behavior we
> expect from the change.
> 
> Patch 5 documents the map type.
> 
> Changes since RFC:
> * Clarify commit message in patch 3 such that it says the lifetime
>   of the storage is ended at the freeing of the cgroup_bpf, rather
>   than the cgroup itself.
> * Restored an -ENOMEM check in __cgroup_bpf_attach.
> * Update selftests for recent change in network_helpers API.
> 
> Changes since v1:
> * s/CHECK_FAIL/CHECK/
> * s/bpf_prog_attach/bpf_program__attach_cgroup/
> * Moved test__start_subtest to test_cg_storage_multi.
> * Removed some redundant CHECK_FAIL where they are already CHECK-ed.
> 
> Changes since v2:
> * Lock cgroup_mutex during map_free.
> * Publish new storages only if attach is successful, by tracking
>   exactly which storages are reused in an array of bools.
> * Mention bpftool map dump showing a value of zero for attach_type
>   in patch 3 commit message.
> 
> Changes since v3:
> * Use a much simpler lookup and allocate-if-not-exist from the fact
>   that cgroup_mutex is locked during attach.
> * Removed an unnecessary spinlock hold.
> 
> Changes since v4:
> * Changed semantics so that if the key type is struct
>   bpf_cgroup_storage_key the map retains isolation between different
>   attach types. Sharing between different attach types only occur
>   when key type is u64.
> * Adapted tests and docs for the above change.
> 
> Changes since v5:
> * Removed redundant NULL check before bpf_link__destroy.
> * Free BPF object explicitly, after asserting that object failed to
>   load, in the event that the object did not fail to load.
> * Rename variable in bpf_cgroup_storage_key_cmp for clarity.
> * Added a lot of information to Documentation, more or less copied
>   from what Martin KaFai Lau wrote.
> 
> YiFei Zhu (5):
>   selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches
>   selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs
>   bpf: Make cgroup storages shared between programs on the same cgroup
>   selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress
>   Documentation/bpf: Document CGROUP_STORAGE map type
Acked-by: Martin KaFai Lau <kafai@xxxxxx>