On Tue, Jul 11, 2023 at 3:57 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Jul 07, 2023 at 12:30:01PM -0700, Stanislav Fomichev wrote: > > + > > +static int mlx5e_devtx_request_l4_checksum(const struct devtx_ctx *_ctx, > > + u16 csum_start, u16 csum_offset) > > +{ > > + const struct mlx5e_devtx_ctx *ctx = (void *)_ctx; > > + struct mlx5_wqe_eth_seg *eseg; > > + > > + if (unlikely(!ctx->wqe)) > > + return -ENODATA; > > + > > + eseg = &ctx->wqe->eth; > > + > > + switch (csum_offset) { > > + case sizeof(struct ethhdr) + sizeof(struct iphdr) + offsetof(struct udphdr, check): > > + case sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + offsetof(struct udphdr, check): > > + /* Looks like HW/FW is doing parsing, so offsets are largely ignored. */ > > + eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; > > + break; > > + default: > > + return -EINVAL; > > + } > > I think this proves my point: csum is not generalizable even across veth and mlx5. > Above is a square peg that tries to fit csum_start/offset api (that makes sense from SW pov) > into HW that has different ideas about csum-ing. > > Here is what mlx5 does: > mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb, > struct mlx5e_accel_tx_state *accel, > struct mlx5_wqe_eth_seg *eseg) > { > if (unlikely(mlx5e_ipsec_txwqe_build_eseg_csum(sq, skb, eseg))) > return; > > if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) { > eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM; > if (skb->encapsulation) { > eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM | > MLX5_ETH_WQE_L4_INNER_CSUM; > sq->stats->csum_partial_inner++; > } else { > eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM; > sq->stats->csum_partial++; > } > > How would you generalize that into csum api that will work across NICs ? > > My answer stands: you cannot. > > My proposal again: > add driver specifc kfuncs and hooks for things like csum. I do see your point, but to also give you my perspective: I have no clue what those _CSUM tx bits do (as a non-mlx employee). And what kind of packets they support (initial patch doesn't give any info). We can definitely expose mlx5 specific request_l4_checksum(bool encap) which does things similar to mlx5e_txwqe_build_eseg_csum, but then, what does it _actually_ do? It obviously can't checksum arbitrary packet formats (because it has this inner/outer selection bit), so there is really no way for me to provide a per-driver kfunc api. Maybe the vendors can? So having csum_start/csum_offset abstraction which works with fixed offsets seems like at least it correctly sets the expectation for BPF program writers. The vendors are already supposed to conform to this start/offset API for skb. But back to your point: should we maybe try to meet somewhere in the middle? 1. We try to provide "generic" offload kfuncs; for mlx5, we'll have this mlx5e_devtx_request_l4_checksum which works for fixed offsets 2. We also let vendors do device-specific "extensions" where devices deviate too much: bpf_request_RAW_mlx5e_l4_checksum(bool encap) This can give BPF authors opportunity to write somewhat portable programs and also use vendor specific apis when/if needed. I think we had a similar idea for rx side: have generic kfuncs, but also let vendors experiment with custom kfuncs if they want to differentiate. WDYT? Can it give us the best things from both sides? > Kuba, > since you nacked driver specific stuff please suggest a way to unblock this stalemate.