On Thu, Jul 23, 2020 at 02:40:58AM -0500, YiFei Zhu wrote: > From: YiFei Zhu <zhuyifei@xxxxxxxxxx> > > The machanics and usage are not very straightforward. Given the > changes it's better to document how it works and how to use it, > rather than having to rely on the examples and implementation to > infer what is going on. > > Signed-off-by: YiFei Zhu <zhuyifei@xxxxxxxxxx> > --- > Documentation/bpf/index.rst | 9 +++ > Documentation/bpf/map_cgroup_storage.rst | 97 ++++++++++++++++++++++++ > 2 files changed, 106 insertions(+) > create mode 100644 Documentation/bpf/map_cgroup_storage.rst > > diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst > index 38b4db8be7a2..26f4bb3107fc 100644 > --- a/Documentation/bpf/index.rst > +++ b/Documentation/bpf/index.rst > @@ -48,6 +48,15 @@ Program types > bpf_lsm > > > +Map types > +========= > + > +.. toctree:: > + :maxdepth: 1 > + > + map_cgroup_storage > + > + > Testing and debugging BPF > ========================= > > diff --git a/Documentation/bpf/map_cgroup_storage.rst b/Documentation/bpf/map_cgroup_storage.rst > new file mode 100644 > index 000000000000..ed6256974508 > --- /dev/null > +++ b/Documentation/bpf/map_cgroup_storage.rst > @@ -0,0 +1,97 @@ > +.. SPDX-License-Identifier: GPL-2.0-only > +.. Copyright (C) 2020 Google LLC. > + > +=========================== > +BPF_MAP_TYPE_CGROUP_STORAGE > +=========================== > + > +The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized > +storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that > +attach to cgroups; the programs are made available by the same config. The s/config/Kconfig/ could be more obvious. It should describe what problem this map is solving and why/when it should be used instead of other general purpose map. Something like, provide a local storage at the cgroup that the bpf prog is attached to. It provides a faster access than the general purpose htab which requires a hashtable lookup. > +storage is identified by the cgroup the program is attached to. > + > +This document describes the usage and semantics of the > +``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in > +Linux 5.9 and this document will describe the differences. > + > +Usage > +===== > + > +The map uses key of type of either ``__u64`` or ``__u64 cgroup_inode_id`` > +``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: > + > + struct bpf_cgroup_storage_key { > + __u64 cgroup_inode_id; > + __u32 attach_type; > + }; > + > +``cgroup_inode_id`` is the inode id of the cgroup directory. > +``attach_type`` is the the program's attach type. > + > +Since Linux 5.9, if the type is ``__u64``, then all attach types of the I think it needs to be more specific that ``__u64 cgroup_inode_id`` is supported since 5.9. > +particular cgroup and map will share the same storage. If the type is > +``struct bpf_cgroup_storage_key``, then programs of different attach types > +be isolated and see different storages. > + > +To access the storage in a program, use ``bpf_get_local_storage``:: > + > + void *bpf_get_local_storage(void *map, u64 flags) > + > +``flags`` is reserved for future use and must be 0. > + > +There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` > +can be accessed by multiple programs across different CPUs, and user should > +take care of synchronization by themselves. It sounds like the bpf prog author is on its own. The bpf infra provides "struct bpf_spin_lock" to synchronize the storage. e.g. tools/testing/selftests/bpf/progs/test_spin_lock.c > + > +Example usage:: > + > + #include <linux/bpf.h> > + > + struct { > + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); > + __type(key, struct bpf_cgroup_storage_key); > + __type(value, __u32); > + } cgroup_storage SEC(".maps"); > + > + int program(struct __sk_buff *skb) > + { > + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); > + __sync_fetch_and_add(ptr, 1); > + > + return 0; > + } > + > +Semantics > +========= > + > +``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This > +per-CPU variant will have different memory regions for each CPU for each > +storage. The non-per-CPU will have the same memory region for each storage. > + > +Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and > +for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded > +that uses the map. A program may be attached to multiple cgroups or have > +multiple attach types, and each attach creates a fresh zeroed storage. The > +storage is freed upon detach. > + > +Since Linux 5.9, storage can be shared by multiple programs. When a program is > +attached to a cgroup, the kernel would create a new storage only if the map > +does not already contain an entry for the cgroup and attach type pair, or else > +the old storage is reused for the new attachment. If the map is attach type > +shared, then attach type is simply ignored during comparison. Storage is freed > +only when either the map or the cgroup attached to is being freed. Detaching > +will not directly free the storage, but it may cause the reference to the map > +to reach zero and indirectly freeing all storage in the map. A few more things should be mentioned <5.9: - There is a one-to-one association between the map and bpf-prog during load/verification time. Thus, each map can only be used by one bpf-prog and each bpf-prog can only use one storage map. - Because of a map can only be used by one bpf-prog, sharing of this cgroup's storage with different bpf-progs were impossible. >= 5.9: - The map is not associated with any bpf-prog, that makes sharing possible. - However, the bpf-prog can still only associate with one map. Thus, a bpf-prog cannot use more than one BPF_MAP_TYPE_CGROUP_STORAGE (i.e. each bpf-prog can only use one cgroup's storage). > + > +In all versions, userspace may use the the attach parameters of cgroup and > +attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map > +APIs to read or update the storage for a given attachment. For Linux 5.9 > +attach type shared storages, only the first value in the struct, cgroup inode > +id, is used during comparison, so userspace may just specify a ``__u64`` > +directly. A sample map definition will be useful in the "Usage" section above. > + > +The storage is bound at attach time. Even if the program is attached to parent > +and triggers in child, the storage still belongs to the parent. > + > +Userspace cannot create a new entry in the map or delete an existing entry. > +Program test runs always use a temporary storage. > -- > 2.27.0 >