David Vernet <void@xxxxxxxxxxxxx> writes: > On Wed, Dec 07, 2022 at 10:27:21AM +0000, Donald Hunter wrote: >> Add documentation for the BPF_MAP_TYPE_SK_STORAGE including >> kernel version introduced, usage and examples. >> >> Signed-off-by: Donald Hunter <donald.hunter@xxxxxxxxx> >> --- >> v2 -> v3: >> - Fix void * return, reported by Yonghong Song >> - Add tracing programs to API note, reported by Yonghong Song >> v1 -> v2: >> - Fix bpf_sk_storage_* function signatures, reported by Yonghong Song >> - Fix NULL return on failure, reported by Yonghong Song >> >> Documentation/bpf/map_sk_storage.rst | 142 +++++++++++++++++++++++++++ >> 1 file changed, 142 insertions(+) >> create mode 100644 Documentation/bpf/map_sk_storage.rst >> >> diff --git a/Documentation/bpf/map_sk_storage.rst b/Documentation/bpf/map_sk_storage.rst >> new file mode 100644 >> index 000000000000..955b287bb7de >> --- /dev/null >> +++ b/Documentation/bpf/map_sk_storage.rst >> @@ -0,0 +1,142 @@ >> +.. SPDX-License-Identifier: GPL-2.0-only >> +.. Copyright (C) 2022 Red Hat, Inc. >> + >> +======================= >> +BPF_MAP_TYPE_SK_STORAGE >> +======================= >> + >> +.. note:: >> + - ``BPF_MAP_TYPE_SK_STORAGE`` was introduced in kernel version 5.2 >> + >> +``BPF_MAP_TYPE_SK_STORAGE`` is used to provide socket-local storage for BPF programs. A map of >> +type ``BPF_MAP_TYPE_SK_STORAGE`` declares the type of storage to be provided and acts as the >> +handle for accessing the socket-local storage from a BPF program. The key type must be ``int`` >> +and ``max_entries`` must be set to ``0``. >> + >> +The ``BPF_F_NO_PREALLOC`` must be used when creating a map for socket-local storage. The kernel >> +is responsible for allocating storage for a socket when requested and for freeing the storage >> +when either the map or the socket is deleted. > > Hi Donald, > > Thanks for writing these excellent docs. > > Would you mind please wrapping your text to 80 columns (throughout the > document)? It's not a hard and fast rule, but it would be ideal to keep > text that can wrap without formatting issues to 80 columns. For > documentation where you're e.g. showing a function signature (such as > bpf_sk_storage_get() below), it's fine to exceed it. Yes, of course. That was an oversight on my part. >> + >> +Usage >> +===== >> + >> +Kernel BPF >> +---------- >> + >> +bpf_sk_storage_get() >> +~~~~~~~~~~~~~~~~~~~~ >> + >> +.. code-block:: c >> + >> + void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags) >> + >> +Socket-local storage can be retrieved using the ``bpf_sk_storage_get()`` helper. The helper gets >> +the storage from ``sk`` that is identified by ``map``. If the >> +``BPF_LOCAL_STORAGE_GET_F_CREATE`` flag is used then ``bpf_sk_storage_get()`` will create the >> +storage for ``sk`` if it does not already exist. ``value`` can be used together with >> +``BPF_LOCAL_STORAGE_GET_F_CREATE`` to initialize the storage value, otherwise it will be zero >> +initialized. Returns a pointer to the storage on success, or ``NULL`` in case of failure. >> + >> +.. note:: >> + - ``sk`` is a kernel ``struct sock`` pointer for LSM or tracing programs. >> + - ``sk`` is a ``struct bpf_sock`` pointer for other program types. >> + >> +bpf_sk_storage_delete() >> +~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +.. code-block:: c >> + >> + long bpf_sk_storage_delete(struct bpf_map *map, void *sk) >> + >> +Socket-local storage can be deleted using the ``bpf_sk_storage_delete()`` helper. The helper >> +deletes the storage from ``sk`` that is identified by ``map``. Returns ``0`` on success, or negative >> +error in case of failure. >> + >> +User space >> +---------- >> + >> +bpf_map_update_elem() >> +~~~~~~~~~~~~~~~~~~~~~ >> + >> +.. code-block:: c >> + >> + int bpf_map_update_elem(int map_fd, const void *key, const void *value, __u64 flags) >> + >> +Socket-local storage with type identified by ``map_fd`` for the socket identified by ``key`` can > > Could you please clarify what you mean by "with type identified by"? > ``map_fd`` just corresponds to the fd of the map, correct? Not following > what's meant by "type". A clumsy attempt to say that the map definition declares the type of the storage that gets added to the socket, as opposed to behaving like a traditional map. I'll rework this to be more clear. >> +be added or updated using the ``bpf_map_update_elem()`` libbpf function. ``key`` must be a >> +pointer to a valid ``fd`` in the user space program. The ``flags`` parameter can be used to >> +control the update behaviour: >> + >> +- ``BPF_ANY`` will create storage for ``fd`` or update existing storage. >> +- ``BPF_NOEXIST`` will create storage for ``fd`` only if it did not already >> + exist > > I believe that if BPF_NOEXIST is specified, if storage already exists > then in addition to storage not being created, the call will fail with > -EEXIST. Good point, I'll add this. >> +- ``BPF_EXIST`` will update existing storage for ``fd`` > > Can we also mention that if BPF_EXIST is specified, and no element is > present with the specified ``key``, that the call will fail with > -ENOENT? I'll add this too. >> + >> +Returns ``0`` on success, or negative error in case of failure. >> + >> +bpf_map_lookup_elem() >> +~~~~~~~~~~~~~~~~~~~~~ >> + >> +.. code-block:: c >> + >> + int bpf_map_lookup_elem(int map_fd, const void *key, void *value) >> + >> +Socket-local storage for the socket identified by ``key`` belonging to ``map_fd`` can be >> +retrieved using the ``bpf_map_lookup_elem()`` libbpf function. ``key`` must be a pointer to a >> +valid ``fd`` in the user space program. Returns ``0`` on success, or negative error in case of >> +failure. >> + >> +bpf_map_delete_elem() >> +~~~~~~~~~~~~~~~~~~~~~ >> + >> +.. code-block:: c >> + >> + int bpf_map_delete_elem (int map_fd, const void *key) > > Extra space between _elem and (. Good catch, thx. >> + >> +Socket-local storage for the socket identified by ``key`` belonging to ``map_fd`` can be deleted >> +using the ``bpf_map_delete_elem()`` libbpf function. Returns ``0`` on success, or negative error >> +in case of failure. >> + >> +Examples >> +======== >> + >> +Kernel BPF >> +---------- >> + >> +This snippet shows how to declare socket-local storage in a BPF program: >> + >> +.. code-block:: c >> + >> + struct { >> + __uint(type, BPF_MAP_TYPE_SK_STORAGE); >> + __uint(map_flags, BPF_F_NO_PREALLOC); >> + __type(key, int); >> + __type(value, struct my_storage); >> + } socket_storage SEC(".maps"); >> + >> +This snippet shows how to retrieve socket-local storage in a BPF program: >> + >> +.. code-block:: c >> + >> + SEC("sockops") >> + int _sockops(struct bpf_sock_ops *ctx) >> + { >> + struct my_storage *storage; >> + struct bpf_sock *sk; >> + >> + sk = ctx->sk; >> + if (!sk) >> + return 1; > > Don't feel strongly about this one, but IMO it's nice for examples to > illustrate code that's as close to real and pristine as possible. To > that point, should this example perhaps be updated to return -ENOENT > here, and -ENOMEM below? Will do. >> + >> + storage = bpf_sk_storage_get(&socket_storage, sk, 0, >> + BPF_LOCAL_STORAGE_GET_F_CREATE); >> + if (!storage) >> + return 1; >> + >> + /* Use 'storage' here */ > > Let's return 0 at the end to make the example program technically > correct. Will do. >> + } > > Thanks, > David