On Thu, Feb 25, 2021 at 9:13 PM Yonghong Song <yhs@xxxxxx> wrote: > > The bpf_for_each_map_elem() helper is introduced which > iterates all map elements with a callback function. The > helper signature looks like > long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags) > and for each map element, the callback_fn will be called. For example, > like hashmap, the callback signature may look like > long callback_fn(map, key, val, callback_ctx) > > There are two known use cases for this. One is from upstream ([1]) where > a for_each_map_elem helper may help implement a timeout mechanism > in a more generic way. Another is from our internal discussion > for a firewall use case where a map contains all the rules. The packet > data can be compared to all these rules to decide allow or deny > the packet. > > For array maps, users can already use a bounded loop to traverse > elements. Using this helper can avoid using bounded loop. For other > type of maps (e.g., hash maps) where bounded loop is hard or > impossible to use, this helper provides a convenient way to > operate on all elements. > > For callback_fn, besides map and map element, a callback_ctx, > allocated on caller stack, is also passed to the callback > function. This callback_ctx argument can provide additional > input and allow to write to caller stack for output. > > If the callback_fn returns 0, the helper will iterate through next > element if available. If the callback_fn returns 1, the helper > will stop iterating and returns to the bpf program. Other return > values are not used for now. > > Currently, this helper is only available with jit. It is possible > to make it work with interpreter with so effort but I leave it > as the future work. > > [1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@xxxxxxxxx/ > > Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > Signed-off-by: Yonghong Song <yhs@xxxxxx> > --- > include/linux/bpf.h | 13 +++ > include/linux/bpf_verifier.h | 3 + > include/uapi/linux/bpf.h | 39 ++++++- > kernel/bpf/bpf_iter.c | 16 +++ > kernel/bpf/helpers.c | 2 + > kernel/bpf/verifier.c | 208 ++++++++++++++++++++++++++++++--- > kernel/trace/bpf_trace.c | 2 + > tools/include/uapi/linux/bpf.h | 39 ++++++- > 8 files changed, 307 insertions(+), 15 deletions(-) > [...] > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 4c24daa43bac..354aaaee8bd9 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -393,6 +393,15 @@ enum bpf_link_type { > * is struct/union. > */ > #define BPF_PSEUDO_BTF_ID 3 > +/* insn[0].src_reg: BPF_PSEUDO_FUNC > + * insn[0].imm: insn offset to the func > + * insn[1].imm: 0 > + * insn[0].off: 0 > + * insn[1].off: 0 > + * ldimm64 rewrite: address of the function > + * verifier type: PTR_TO_FUNC. > + */ > +#define BPF_PSEUDO_FUNC 4 > > /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative > * offset to another bpf function > @@ -3850,7 +3859,6 @@ union bpf_attr { > * > * long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags) > * Description > - BTW, this was fixed in a7c9c25a99bb ("bpf: Remove blank line in bpf helper description comment") and applied to the bpf tree. Not sure if it will cause a merge conflict later. Maybe Alexei or Daniel can just add this line back while applying? > * Check ctx packet size against exceeding MTU of net device (based > * on *ifindex*). This helper will likely be used in combination > * with helpers that adjust/change the packet size. > @@ -3910,6 +3918,34 @@ union bpf_attr { > * * **BPF_MTU_CHK_RET_FRAG_NEEDED** > * * **BPF_MTU_CHK_RET_SEGS_TOOBIG** > * [...]