On Mon, Feb 28, 2022 at 8:29 AM Florian Westphal <fw@xxxxxxxxx> wrote: > > Eric Dumazet says: > The sock_hold() side seems suspect, because there is no guarantee > that sk_refcnt is not already 0. > > Also, there is a way to wire up skb->sk in a way that skb doesn't hold > a reference on the socket, so we need to check for that as well. > > For refcount-less skb->sk case, try to increment the reference count > and then override the destructor. > > On failure, we cannot queue the packet and need to indicate an > error. THe packet will be dropped by the caller. > > Cc: Joe Stringer <joe@xxxxxxxxx> > Fixes: 271b72c7fa82c ("udp: RCU handling for Unicast packets.") Hi Florian, thanks for the fix. skb_sk_is_prefetched() was introduced in commit cf7fbe660f2d ("bpf: Add socket assign support"). You may want to split the hunk below into a dedicated patch for this reason. > Reported-by: Eric Dumazet <eric.dumazet@xxxxxxxxx> > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> > --- > Eric, does that look sane to you? > Joe, it would be nice if you could check the 'skb_sk_is_prefetched' part. <snip> > @@ -178,6 +180,18 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state, > break; > } > > + if (skb_sk_is_prefetched(skb)) { > + struct sock *sk = skb->sk; > + > + if (!sk_is_refcounted(sk)) { > + if (!refcount_inc_not_zero(&sk->sk_refcnt)) > + return -ENOTCONN; > + > + /* drop refcount on skb_orphan */ > + skb->destructor = sock_edemux; > + } > + } > + > entry = kmalloc(sizeof(*entry) + route_key_size, GFP_ATOMIC); > if (!entry) > return -ENOMEM; I've never heard of someone trying to use socket prefetch / bpf_sk_assign in conjunction with nf_queue, bit of an unusual case. Given that `skb_sk_is_prefetched()` relies on the skb->destructor pointing towards sock_pfree, and this code would change that to sock_edemux, the difference the patch would make is this: if the packet is queued and then accepted, the socket prefetch selection could be ignored. That said, it seems potentially dangerous to fail to hold the socket reference during the nfqueue operation, so this proposal is preferable to the alternative. There's possibly ways for someone to still get such a use case to work even with this patch (eg via iptables TPROXY target rules), or worse case they could iterate on this code a little further to ensure this case works (eg by coming up with a new free func that takes this case into account). More likely, they would find an alternative solution to their problem that doesn't involve combining these particular features in the same setup. I looked closely at this hunk, I didn't look closely at the rest of the patch. Assuming you split just this hunk into a dedicated patch, you can add my Ack: Acked-by: Joe Stringer <joe@xxxxxxxxx>