Re: [PATCH bpf-next v4 05/12] bpf: add bpf_for_each_map_elem() helper

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 26 Feb 2021 11:22:51 -0800

On Thu, Feb 25, 2021 at 9:13 PM Yonghong Song <yhs@xxxxxx> wrote:
>
> The bpf_for_each_map_elem() helper is introduced which
> iterates all map elements with a callback function. The
> helper signature looks like
>   long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags)
> and for each map element, the callback_fn will be called. For example,
> like hashmap, the callback signature may look like
>   long callback_fn(map, key, val, callback_ctx)
>
> There are two known use cases for this. One is from upstream ([1]) where
> a for_each_map_elem helper may help implement a timeout mechanism
> in a more generic way. Another is from our internal discussion
> for a firewall use case where a map contains all the rules. The packet
> data can be compared to all these rules to decide allow or deny
> the packet.
>
> For array maps, users can already use a bounded loop to traverse
> elements. Using this helper can avoid using bounded loop. For other
> type of maps (e.g., hash maps) where bounded loop is hard or
> impossible to use, this helper provides a convenient way to
> operate on all elements.
>
> For callback_fn, besides map and map element, a callback_ctx,
> allocated on caller stack, is also passed to the callback
> function. This callback_ctx argument can provide additional
> input and allow to write to caller stack for output.
>
> If the callback_fn returns 0, the helper will iterate through next
> element if available. If the callback_fn returns 1, the helper
> will stop iterating and returns to the bpf program. Other return
> values are not used for now.
>
> Currently, this helper is only available with jit. It is possible
> to make it work with interpreter with so effort but I leave it
> as the future work.
>
> [1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@xxxxxxxxx/
>
> Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> Signed-off-by: Yonghong Song <yhs@xxxxxx>
> ---
>  include/linux/bpf.h            |  13 +++
>  include/linux/bpf_verifier.h   |   3 +
>  include/uapi/linux/bpf.h       |  39 ++++++-
>  kernel/bpf/bpf_iter.c          |  16 +++
>  kernel/bpf/helpers.c           |   2 +
>  kernel/bpf/verifier.c          | 208 ++++++++++++++++++++++++++++++---
>  kernel/trace/bpf_trace.c       |   2 +
>  tools/include/uapi/linux/bpf.h |  39 ++++++-
>  8 files changed, 307 insertions(+), 15 deletions(-)
>

[...]

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4c24daa43bac..354aaaee8bd9 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -393,6 +393,15 @@ enum bpf_link_type {
>   *                   is struct/union.
>   */
>  #define BPF_PSEUDO_BTF_ID      3
> +/* insn[0].src_reg:  BPF_PSEUDO_FUNC
> + * insn[0].imm:      insn offset to the func
> + * insn[1].imm:      0
> + * insn[0].off:      0
> + * insn[1].off:      0
> + * ldimm64 rewrite:  address of the function
> + * verifier type:    PTR_TO_FUNC.
> + */
> +#define BPF_PSEUDO_FUNC                4
>
>  /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>   * offset to another bpf function
> @@ -3850,7 +3859,6 @@ union bpf_attr {
>   *
>   * long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags)
>   *     Description
> -

BTW, this was fixed in a7c9c25a99bb ("bpf: Remove blank line in bpf
helper description comment") and applied to the bpf tree. Not sure if
it will cause a merge conflict later. Maybe Alexei or Daniel can just
add this line back while applying?

>   *             Check ctx packet size against exceeding MTU of net device (based
>   *             on *ifindex*).  This helper will likely be used in combination
>   *             with helpers that adjust/change the packet size.
> @@ -3910,6 +3918,34 @@ union bpf_attr {
>   *             * **BPF_MTU_CHK_RET_FRAG_NEEDED**
>   *             * **BPF_MTU_CHK_RET_SEGS_TOOBIG**
>   *

[...]