Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > If netfilter changes the packet mark when mangling, the packet is > rerouted using the route_me_harder set of functions. Prior to this > commit, there's one big difference between route_me_harder and the > ordinary initial routing functions, described in the comment above > __ip_queue_xmit(): > > /* Note: skb->sk can be different from sk, in case of tunnels */ > int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl, > > That function goes on to correctly make use of sk->sk_bound_dev_if, > rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a > tunnel will receive a packet in ndo_start_xmit with an initial skb->sk. > It will make some transformations to that packet, and then it will send > the encapsulated packet out of a *new* socket. That new socket will > basically always have a different sk_bound_dev_if (otherwise there'd be > a routing loop). So for the purposes of routing the encapsulated packet, > the routing information as it pertains to the socket should come from > that socket's sk, rather than the packet's original skb->sk. For that > reason __ip_queue_xmit() and related functions all do the right thing. > > One might argue that all tunnels should just call skb_orphan(skb) before > transmitting the encapsulated packet into the new socket. But tunnels do > *not* do this -- and this is wisely avoided in skb_scrub_packet() too -- > because features like TSQ rely on skb->destructor() being called when > that buffer space is truely available again. Calling skb_orphan(skb) too > early would result in buffers filling up unnecessarily and accounting > info being all wrong. Instead, additional routing must take into account > the new sk, just as __ip_queue_xmit() notes. > > So, this commit addresses the problem by fishing the correct sk out of > state->sk -- it's already set properly in the call to nf_hook() in > __ip_local_out(), which receives the sk as part of its normal > functionality. So we make sure to plumb state->sk through the various > route_me_harder functions, and then make correct use of it following the > example of __ip_queue_xmit(). Reviewed-by: Florian Westphal <fw@xxxxxxxxx>