Re: [PATCH v5 bpf-next 5/5] Documentation/bpf: Document CGROUP_STORAGE map type

Martin KaFai Lau <kafai@xxxxxx> · Thu, 23 Jul 2020 17:02:57 -0700

On Thu, Jul 23, 2020 at 02:40:58AM -0500, YiFei Zhu wrote:
> From: YiFei Zhu <zhuyifei@xxxxxxxxxx>
> 
> The machanics and usage are not very straightforward. Given the
> changes it's better to document how it works and how to use it,
> rather than having to rely on the examples and implementation to
> infer what is going on.
> 
> Signed-off-by: YiFei Zhu <zhuyifei@xxxxxxxxxx>
> ---
>  Documentation/bpf/index.rst              |  9 +++
>  Documentation/bpf/map_cgroup_storage.rst | 97 ++++++++++++++++++++++++
>  2 files changed, 106 insertions(+)
>  create mode 100644 Documentation/bpf/map_cgroup_storage.rst
> 
> diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
> index 38b4db8be7a2..26f4bb3107fc 100644
> --- a/Documentation/bpf/index.rst
> +++ b/Documentation/bpf/index.rst
> @@ -48,6 +48,15 @@ Program types
>     bpf_lsm
>  
>  
> +Map types
> +=========
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   map_cgroup_storage
> +
> +
>  Testing and debugging BPF
>  =========================
>  
> diff --git a/Documentation/bpf/map_cgroup_storage.rst b/Documentation/bpf/map_cgroup_storage.rst
> new file mode 100644
> index 000000000000..ed6256974508
> --- /dev/null
> +++ b/Documentation/bpf/map_cgroup_storage.rst
> @@ -0,0 +1,97 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +.. Copyright (C) 2020 Google LLC.
> +
> +===========================
> +BPF_MAP_TYPE_CGROUP_STORAGE
> +===========================
> +
> +The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
> +storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
> +attach to cgroups; the programs are made available by the same config. The
s/config/Kconfig/ could be more obvious.

It should describe what problem this map is solving and why/when it should be used
instead of other general purpose map.

Something like,
provide a local storage at the cgroup that the bpf prog is attached to.
It provides a faster access than the general purpose htab which requires a
hashtable lookup.

> +storage is identified by the cgroup the program is attached to.
> +
> +This document describes the usage and semantics of the
> +``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
> +Linux 5.9 and this document will describe the differences.
> +
> +Usage
> +=====
> +
> +The map uses key of type of either ``__u64`` or
``__u64 cgroup_inode_id``

> +``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::
> +
> +    struct bpf_cgroup_storage_key {
> +            __u64 cgroup_inode_id;
> +            __u32 attach_type;
> +    };
> +
> +``cgroup_inode_id`` is the inode id of the cgroup directory.
> +``attach_type`` is the the program's attach type.
> +
> +Since Linux 5.9, if the type is ``__u64``, then all attach types of the
I think it needs to be more specific that ``__u64 cgroup_inode_id``
is supported since 5.9.

> +particular cgroup and map will share the same storage. If the type is
> +``struct bpf_cgroup_storage_key``, then programs of different attach types
> +be isolated and see different storages.
> +
> +To access the storage in a program, use ``bpf_get_local_storage``::
> +
> +    void *bpf_get_local_storage(void *map, u64 flags)
> +
> +``flags`` is reserved for future use and must be 0.
> +
> +There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
> +can be accessed by multiple programs across different CPUs, and user should
> +take care of synchronization by themselves.
It sounds like the bpf prog author is on its own.
The bpf infra provides "struct bpf_spin_lock" to synchronize the storage.
e.g. tools/testing/selftests/bpf/progs/test_spin_lock.c

> +
> +Example usage::
> +
> +    #include <linux/bpf.h>
> +
> +    struct {
> +            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
> +            __type(key, struct bpf_cgroup_storage_key);
> +            __type(value, __u32);
> +    } cgroup_storage SEC(".maps");
> +
> +    int program(struct __sk_buff *skb)
> +    {
> +            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
> +            __sync_fetch_and_add(ptr, 1);
> +
> +            return 0;
> +    }
> +
> +Semantics
> +=========
> +
> +``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
> +per-CPU variant will have different memory regions for each CPU for each
> +storage. The non-per-CPU will have the same memory region for each storage.
> +
> +Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
> +for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
> +that uses the map. A program may be attached to multiple cgroups or have
> +multiple attach types, and each attach creates a fresh zeroed storage. The
> +storage is freed upon detach.
> +
> +Since Linux 5.9, storage can be shared by multiple programs. When a program is
> +attached to a cgroup, the kernel would create a new storage only if the map
> +does not already contain an entry for the cgroup and attach type pair, or else
> +the old storage is reused for the new attachment. If the map is attach type
> +shared, then attach type is simply ignored during comparison. Storage is freed
> +only when either the map or the cgroup attached to is being freed. Detaching
> +will not directly free the storage, but it may cause the reference to the map
> +to reach zero and indirectly freeing all storage in the map.
A few more things should be mentioned

<5.9:
- There is a one-to-one association between the map and bpf-prog during
  load/verification time.
  Thus, each map can only be used by one bpf-prog and
  each bpf-prog can only use one storage map.
- Because of a map can only be used by one bpf-prog,
  sharing of this cgroup's storage with different bpf-progs
  were impossible.

>= 5.9:
- The map is not associated with any bpf-prog, that makes sharing
  possible.
- However, the bpf-prog can still only associate with one
  map.  Thus, a bpf-prog cannot use more than one
  BPF_MAP_TYPE_CGROUP_STORAGE (i.e. each
  bpf-prog can only use one cgroup's storage).

> +
> +In all versions, userspace may use the the attach parameters of cgroup and
> +attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
> +APIs to read or update the storage for a given attachment. For Linux 5.9
> +attach type shared storages, only the first value in the struct, cgroup inode
> +id, is used during comparison, so userspace may just specify a ``__u64``
> +directly.
A sample map definition will be useful in the "Usage" section above.

> +
> +The storage is bound at attach time. Even if the program is attached to parent
> +and triggers in child, the storage still belongs to the parent.
> +
> +Userspace cannot create a new entry in the map or delete an existing entry.
> +Program test runs always use a temporary storage.
> -- 
> 2.27.0
>