Re: [PATCH bpf-next v1 03/19] bpf: add bpf_map iterator

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 28 Apr 2020 23:04:21 -0700

On Mon, Apr 27, 2020 at 1:18 PM Yonghong Song <yhs@xxxxxx> wrote:
>
> The bpf_map iterator is implemented.
> The bpf program is called at seq_ops show() and stop() functions.
> bpf_iter_get_prog() will retrieve bpf program and other
> parameters during seq_file object traversal. In show() function,
> bpf program will traverse every valid object, and in stop()
> function, bpf program will be called one more time after all
> objects are traversed.
>
> The first member of the bpf context contains the meta data, namely,
> the seq_file, session_id and seq_num. Here, the session_id is
> a unique id for one specific seq_file session. The seq_num is
> the number of bpf prog invocations in the current session.
> The bpf_iter_get_prog(), which will be implemented in subsequent
> patches, will have more information on how meta data are computed.
>
> The second member of the bpf context is a struct bpf_map pointer,
> which bpf program can examine.
>
> The target implementation also provided the structure definition
> for bpf program and the function definition for verifier to
> verify the bpf program. Specifically for bpf_map iterator,
> the structure is "bpf_iter__bpf_map" andd the function is
> "__bpf_iter__bpf_map".
>
> More targets will be implemented later, all of which will include
> the following, similar to bpf_map iterator:
>   - seq_ops() implementation
>   - function definition for verifier to verify the bpf program
>   - seq_file private data size
>   - additional target feature
>
> Signed-off-by: Yonghong Song <yhs@xxxxxx>
> ---
>  include/linux/bpf.h   |  10 ++++
>  kernel/bpf/Makefile   |   2 +-
>  kernel/bpf/bpf_iter.c |  19 ++++++++
>  kernel/bpf/map_iter.c | 107 ++++++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c  |  13 +++++
>  5 files changed, 150 insertions(+), 1 deletion(-)
>  create mode 100644 kernel/bpf/map_iter.c
>

[...]

> +static int __init bpf_map_iter_init(void)
> +{
> +       struct bpf_iter_reg reg_info = {
> +               .target                 = "bpf_map",
> +               .target_func_name       = "__bpf_iter__bpf_map",

I wonder if it would be better instead of strings to use a pointer to
a function here. It would preserve __bpf_iter__bpf_map function
without __init, plus it's hard to mistype the name accidentally. In
bpf_iter_reg_target() one would just need to find function in kallsyms
by function address and extract it's name.

Or that would be too much trouble?

> +               .seq_ops                = &bpf_map_seq_ops,
> +               .seq_priv_size          = sizeof(struct bpf_iter_seq_map_info),
> +               .target_feature         = 0,
> +       };
> +
> +       return bpf_iter_reg_target(&reg_info);
> +}
> +
> +late_initcall(bpf_map_iter_init);
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 7626b8024471..022187640943 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2800,6 +2800,19 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr,
>         return err;
>  }
>
> +struct bpf_map *bpf_map_get_curr_or_next(u32 *id)
> +{
> +       struct bpf_map *map;
> +
> +       spin_lock_bh(&map_idr_lock);
> +       map = idr_get_next(&map_idr, id);
> +       if (map)
> +               map = __bpf_map_inc_not_zero(map, false);

When __bpf_map_inc_not_zero return ENOENT, it doesn't mean there are
no more BPF maps, it just means that the current one we got was
already released (or in the process of being released). I think you
need to retry with id+1 in such case, otherwise your iteration might
end prematurely.

> +       spin_unlock_bh(&map_idr_lock);
> +
> +       return map;
> +}
> +
>  #define BPF_PROG_GET_FD_BY_ID_LAST_FIELD prog_id
>
>  struct bpf_prog *bpf_prog_by_id(u32 id)
> --
> 2.24.1
>