Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > > AFAICS so far this would be enough: > > > > 1. remove the BUG_ON() in skb_orphan, letting it clear skb->sk instead > > 2. in nf_queue_entry_get_refs(), if skb->sk and no destructor: > > call nf_tproxy_assign_sock() so a reference gets taken. > > 3. change skb_steal_sock: > > static inline struct sock *skb_steal_sock(struct sk_buff *skb, bool *refcounted) > > [..] > > *refcounted = skb->destructor != NULL; > > 4. make tproxy sk assign elide the destructor assigment in case of > > a listening sk. > > > > Okay, but how do we make sure the skb->sk association does not leak from rcu section ? >From netfilter pov the only escape point is nfqueue (and kfree_skb), so for tcp/udp it will end up in their respective rx path eventually. But you are right in that we need to also audit all NF_STOLEN users that can be invoked from PRE_ROUTING and INPUT hooks. OUTPUT/FORWARD/POSTROUTING are not relevant, in case skb enters IP forwarding, it will be dropped there (we have a check to toss skb with socket attached in forward). In recent hallway discussion Eric suggested to add a empty destructor stub, it would allow to do the needed annotation, i.e. no need to change skb_orphan(), *refcounted would be set via skb->destructor != noref_listen_skb_destructor check. > Note we have the noref/refcounted magic for skb_dst(), we might try to use something similar > for skb->sk Yes, would be more code churn because we have to replace skb->sk access by a helper to mask off NOREF bit (or we need to add a "noref" bit in sk_buff itself).