Changes since the original RFC: - 'struct bpf_patch' to more easily manage insn arrays While working on longer unrolled functions I've bumped into verifier's insn_buf[16]. Instead of increasing it, I've added a simple abstraction that handles resizing. *insn++ = BPF_XXX_XXX(); *insn++ = BPF_YYY_YYY(); vs bpf_patch_append(patch, BPF_XXX_XXX(), BPF_YYY_YYY(), ); There are also some tricks where we resolve BPF_JMP_IMM(op, dst, imm, S16_MAX) to the real end of the program; mostly to make it easy to generate the following: if (some_condition) { if (some_other_condition) { // insns } } - Drop xdp_buff->priv in favor of container_of (Jakub) The drivers might need to maintain more data, so instead of constraining them to a single ->priv pointer in xdp_buff, use container_of and require the users to define the outer struct. xdp_buff should be the first member. Each driver now has two patches: the first one introduces new struct wrapper; the second one does the actual work. - bpf_xdp_metadata_export_to_skb (Toke) To solve the case where we're constructing skb from a redirected frame. bpf_xdp_metadata_export_to_skb is an unrolled kfunc that prepares 'struct xdp_to_skb_metadata' in the metadata. The layout is randomized and it has a randomized magic number to make sure userspace doesn't depend on it. I'm not sure how strict we should be here? Toke/Jesper seem to be ok with userspace depending on this but as long as they read the layout via BTF, so maybe having a fixed magic is ok/better her? Later, at skb_metadata_set time, we call into skb_metadata_import_from_xdp that tries to parse this fixed format and extract relevant metadata. Note that because we are unrolling bpf_xdp_metadata_export_to_skb, we have to constrain other kfuncs to R2-R5 only; I've added the reasoning to the comments. It might be better idea to do a real kfunc call (but we have to generate this kfunc anyway)? - helper to make it easier to call kernel code from unrolled kfuncs Since we're unrolling, we're running in a somewhat constrained environment. R6-R9 belong to the real callee, so we can't use them to stash our R1-R5. We also can't use stack in any way. Another constraint comes from bpf_xdp_metadata_export_to_skb which might call several metadata kfuncs and wants its R1 to be preserved. Thus, we add xdp_kfunc_call_preserving_r1, which generates the bytecode to preserve r1 somewhere in the user-supplied context. Again, we have to do this because we unroll bpf_xdp_metadata_export_to_skb. - mlx4/bnxt/ice examples (Jesper) ice is the only one where it seems feasible to unroll. The code is untested, but at least it shows that it's somewhat possible to get to the timestamp in our R2-R5-constrained environment. mlx4/bnxt do spinlocks/seqlocks, so I'm just calling into the kernel code. - Unroll default implementation to return false/0/NULL Original RFC left default kfuncs calls when the driver doesn't do the unrolling. Here, instead, we unroll to single 'return 0' instruction. - Drop prog_ifindex libbpf patch, use bpf_program__set_ifindex instead (Andrii) - Keep returning metadata by value instead of using a pointer I've tried using the pointer, it works currently, but it requires extra argument per commit eb1f7f71c126 ("bpf/verifier: allow kfunc to return an allocated mem"). The requirement can probably be lifted for our case, but not sure it's necessary for now. While adding rx_timestamp support for the drivers, it turns out we never really return the raw pointer to the descriptor field. We read the descriptor field, do shifts/multiplies, convert to kernel clock, etc/etc. So returning a value instead of a pointer seems more logical, at least for the rx timestamp case. For the other types of metadata, we might reconsider. - rename bpf_xdp_metadata_have_<xxx> to bpf_xdp_metadata_<xxx>_supported Spotted in one of Toke's emails. Seems like it better conveys the intent that it actually tests that the device supports the metadata, not that the particular packet has the metadata. The following is unchanged since the original RFC: This is an RFC for the alternative approach suggested by Martin and Jakub. I've tried to CC most of the people from the original discussion, feel free to add more if you think I've missed somebody. Summary: - add new BPF_F_XDP_HAS_METADATA program flag and abuse attr->prog_ifindex to pass target device ifindex at load time - at load time, find appropriate ndo_unroll_kfunc and call it to unroll/inline kfuncs; kfuncs have the default "safe" implementation if unrolling is not supported by a particular device - rewrite xskxceiver test to use C bpf program and extend it to export rx_timestamp (plus add rx timestamp to veth driver) I've intentionally kept it small and hacky to see whether the approach is workable or not. Pros: - we avoid BTF complexity; the BPF programs themselves are now responsible for agreeing on the metadata layout with the AF_XDP consumer - the metadata is free if not used - the metadata should, in theory, be cheap if used; kfuncs should be unrolled to the same code as if the metadata was pre-populated and passed with a BTF id - it's not all or nothing; users can use small subset of metadata which is more efficient than the BTF id approach where all metadata has to be exposed for every frame (and selectively consumed by the users) Cons: - forwarding has to be handled explicitly; the BPF programs have to agree on the metadata layout (IOW, the forwarding program has to be aware of the final AF_XDP consumer metadata layout) - TX picture is not clear; but it's not clear with BTF ids as well; I think we've agreed that just reusing whatever we have at RX won't fly at TX; seems like TX XDP program might be the answer here? (with a set of another tx kfuncs to "expose" bpf/af_xdp metatata back into the kernel) Cc: John Fastabend <john.fastabend@xxxxxxxxx> Cc: David Ahern <dsahern@xxxxxxxxx> Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx> Cc: Jakub Kicinski <kuba@xxxxxxxxxx> Cc: Willem de Bruijn <willemb@xxxxxxxxxx> Cc: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> Cc: Anatoly Burakov <anatoly.burakov@xxxxxxxxx> Cc: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> Cc: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> Cc: Maryam Tahhan <mtahhan@xxxxxxxxxx> Cc: xdp-hints@xxxxxxxxxxxxxxx Cc: netdev@xxxxxxxxxxxxxxx Stanislav Fomichev (14): bpf: Introduce bpf_patch bpf: Support inlined/unrolled kfuncs for xdp metadata veth: Introduce veth_xdp_buff wrapper for xdp_buff veth: Support rx timestamp metadata for xdp selftests/bpf: Verify xdp_metadata xdp->af_xdp path xdp: Carry over xdp metadata into skb context selftests/bpf: Verify xdp_metadata xdp->skb path bpf: Helper to simplify calling kernel routines from unrolled kfuncs ice: Introduce ice_xdp_buff wrapper for xdp_buff ice: Support rx timestamp metadata for xdp mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff mxl4: Support rx timestamp metadata for xdp bnxt: Introduce bnxt_xdp_buff wrapper for xdp_buff bnxt: Support rx timestamp metadata for xdp drivers/net/ethernet/broadcom/bnxt/bnxt.c | 73 +++- drivers/net/ethernet/intel/ice/ice.h | 5 + drivers/net/ethernet/intel/ice/ice_main.c | 1 + drivers/net/ethernet/intel/ice/ice_txrx.c | 105 ++++- .../net/ethernet/mellanox/mlx4/en_netdev.c | 1 + drivers/net/ethernet/mellanox/mlx4/en_rx.c | 66 ++- drivers/net/veth.c | 89 ++-- include/linux/bpf.h | 1 + include/linux/bpf_patch.h | 29 ++ include/linux/btf.h | 1 + include/linux/btf_ids.h | 4 + include/linux/mlx4/device.h | 7 + include/linux/netdevice.h | 5 + include/linux/skbuff.h | 4 + include/net/xdp.h | 41 ++ include/uapi/linux/bpf.h | 5 + kernel/bpf/Makefile | 2 +- kernel/bpf/bpf_patch.c | 81 ++++ kernel/bpf/syscall.c | 28 +- kernel/bpf/verifier.c | 85 ++++ net/core/dev.c | 7 + net/core/skbuff.c | 25 ++ net/core/xdp.c | 165 +++++++- tools/include/uapi/linux/bpf.h | 5 + tools/testing/selftests/bpf/Makefile | 2 +- .../selftests/bpf/prog_tests/xdp_metadata.c | 394 ++++++++++++++++++ .../selftests/bpf/progs/xdp_metadata.c | 78 ++++ 27 files changed, 1244 insertions(+), 65 deletions(-) create mode 100644 include/linux/bpf_patch.h create mode 100644 kernel/bpf/bpf_patch.c create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c -- 2.38.1.431.g37b22c650d-goog