On Thu, Nov 28, 2024 at 1:38 PM Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > Softirq can interrupt packet from process context which walks over the > percpu area. > > Add routines to disable bh while restoring and saving the tunnel parser > context from percpu area to stack. Add a skbuff owner for this percpu > area to catch softirq interference to exercise the packet tunnel parser > again in such case. > > Reported-by: syzbot+84d0441b9860f0d63285@xxxxxxxxxxxxxxxxxxxxxxxxx > Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") > Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > --- > include/net/netfilter/nf_tables_core.h | 1 + > net/netfilter/nft_inner.c | 56 ++++++++++++++++++++------ > 2 files changed, 45 insertions(+), 12 deletions(-) > > diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h > index ff27cb2e1662..dae0e7592934 100644 > --- a/include/net/netfilter/nf_tables_core.h > +++ b/include/net/netfilter/nf_tables_core.h > @@ -161,6 +161,7 @@ enum { > }; > > struct nft_inner_tun_ctx { > + struct sk_buff *skb; /* percpu area owner */ > u16 type; > u16 inner_tunoff; > u16 inner_lloff; > diff --git a/net/netfilter/nft_inner.c b/net/netfilter/nft_inner.c > index 928312d01eb1..fcaa126ac8da 100644 > --- a/net/netfilter/nft_inner.c > +++ b/net/netfilter/nft_inner.c > @@ -210,35 +210,65 @@ static int nft_inner_parse(const struct nft_inner *priv, > struct nft_pktinfo *pkt, > struct nft_inner_tun_ctx *tun_ctx) > { > - struct nft_inner_tun_ctx ctx = {}; > u32 off = pkt->inneroff; > > if (priv->flags & NFT_INNER_HDRSIZE && > - nft_inner_parse_tunhdr(priv, pkt, &ctx, &off) < 0) > + nft_inner_parse_tunhdr(priv, pkt, tun_ctx, &off) < 0) > return -1; > > if (priv->flags & (NFT_INNER_LL | NFT_INNER_NH)) { > - if (nft_inner_parse_l2l3(priv, pkt, &ctx, off) < 0) > + if (nft_inner_parse_l2l3(priv, pkt, tun_ctx, off) < 0) > return -1; > } else if (priv->flags & NFT_INNER_TH) { > - ctx.inner_thoff = off; > - ctx.flags |= NFT_PAYLOAD_CTX_INNER_TH; > + tun_ctx->inner_thoff = off; > + tun_ctx->flags |= NFT_PAYLOAD_CTX_INNER_TH; > } > > - *tun_ctx = ctx; > tun_ctx->type = priv->type; > + tun_ctx->skb = pkt->skb; > pkt->flags |= NFT_PKTINFO_INNER_FULL; > > return 0; > } > > +static bool nft_inner_restore_tun_ctx(const struct nft_pktinfo *pkt, > + struct nft_inner_tun_ctx *tun_ctx) > +{ > + struct nft_inner_tun_ctx *this_cpu_tun_ctx; > + > + local_bh_disable(); > + this_cpu_tun_ctx = this_cpu_ptr(&nft_pcpu_tun_ctx); > + if (this_cpu_tun_ctx->skb != pkt->skb) { I must say I do not understand this patch. If a context is used by a save/restore more than one time per packet traversal, then this means we can not use per-cpu storage, or risk flakes. Also, skb could be freed and re-allocated ? Perhaps describe a bit more what is going on in the changelog.