Re: [RFC bpf-next v3 09/14] net/mlx5e: Implement devtx kfuncs

Stanislav Fomichev <sdf@xxxxxxxxxx> · Tue, 11 Jul 2023 16:24:54 -0700

On Tue, Jul 11, 2023 at 3:57 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Jul 07, 2023 at 12:30:01PM -0700, Stanislav Fomichev wrote:
> > +
> > +static int mlx5e_devtx_request_l4_checksum(const struct devtx_ctx *_ctx,
> > +                                        u16 csum_start, u16 csum_offset)
> > +{
> > +     const struct mlx5e_devtx_ctx *ctx = (void *)_ctx;
> > +     struct mlx5_wqe_eth_seg *eseg;
> > +
> > +     if (unlikely(!ctx->wqe))
> > +             return -ENODATA;
> > +
> > +     eseg = &ctx->wqe->eth;
> > +
> > +     switch (csum_offset) {
> > +     case sizeof(struct ethhdr) + sizeof(struct iphdr) + offsetof(struct udphdr, check):
> > +     case sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + offsetof(struct udphdr, check):
> > +             /* Looks like HW/FW is doing parsing, so offsets are largely ignored. */
> > +             eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
> > +             break;
> > +     default:
> > +             return -EINVAL;
> > +     }
>
> I think this proves my point: csum is not generalizable even across veth and mlx5.
> Above is a square peg that tries to fit csum_start/offset api (that makes sense from SW pov)
> into HW that has different ideas about csum-ing.
>
> Here is what mlx5 does:
> mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
>                             struct mlx5e_accel_tx_state *accel,
>                             struct mlx5_wqe_eth_seg *eseg)
> {
>         if (unlikely(mlx5e_ipsec_txwqe_build_eseg_csum(sq, skb, eseg)))
>                 return;
>
>         if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
>                 eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
>                 if (skb->encapsulation) {
>                         eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
>                                           MLX5_ETH_WQE_L4_INNER_CSUM;
>                         sq->stats->csum_partial_inner++;
>                 } else {
>                         eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
>                         sq->stats->csum_partial++;
>                 }
>
> How would you generalize that into csum api that will work across NICs ?
>
> My answer stands: you cannot.
>
> My proposal again:
> add driver specifc kfuncs and hooks for things like csum.

I do see your point, but to also give you my perspective: I have no
clue what those _CSUM tx bits do (as a non-mlx employee). And what
kind of packets they support (initial patch doesn't give any info).
We can definitely expose mlx5 specific request_l4_checksum(bool encap)
which does things similar to mlx5e_txwqe_build_eseg_csum, but then,
what does it _actually_ do? It obviously can't checksum arbitrary
packet formats (because it has this inner/outer selection bit), so
there is really no way for me to provide a per-driver kfunc api. Maybe
the vendors can?

So having csum_start/csum_offset abstraction which works with fixed
offsets seems like at least it correctly sets the expectation for BPF
program writers.
The vendors are already supposed to conform to this start/offset API for skb.

But back to your point: should we maybe try to meet somewhere in the middle?
1. We try to provide "generic" offload kfuncs; for mlx5, we'll have
this mlx5e_devtx_request_l4_checksum which works for fixed offsets
2. We also let vendors do device-specific "extensions" where devices
deviate too much: bpf_request_RAW_mlx5e_l4_checksum(bool encap)
This can give BPF authors opportunity to write somewhat portable
programs and also use vendor specific apis when/if needed.

I think we had a similar idea for rx side: have generic kfuncs, but
also let vendors experiment with custom kfuncs if they want to
differentiate.
WDYT? Can it give us the best things from both sides?

> Kuba,
> since you nacked driver specific stuff please suggest a way to unblock this stalemate.