Re: [PATCH 4.19] netfilter: use actual socket sk rather than skb sk when routing harder

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 13, 2020 at 11:49:36PM +0100, Jason A. Donenfeld wrote:
> [ Upstream commit 46d6c5ae953cc0be38efd0e469284df7c4328cf8 ]
> 
> If netfilter changes the packet mark when mangling, the packet is
> rerouted using the route_me_harder set of functions. Prior to this
> commit, there's one big difference between route_me_harder and the
> ordinary initial routing functions, described in the comment above
> __ip_queue_xmit():
> 
>    /* Note: skb->sk can be different from sk, in case of tunnels */
>    int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
> 
> That function goes on to correctly make use of sk->sk_bound_dev_if,
> rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
> tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
> It will make some transformations to that packet, and then it will send
> the encapsulated packet out of a *new* socket. That new socket will
> basically always have a different sk_bound_dev_if (otherwise there'd be
> a routing loop). So for the purposes of routing the encapsulated packet,
> the routing information as it pertains to the socket should come from
> that socket's sk, rather than the packet's original skb->sk. For that
> reason __ip_queue_xmit() and related functions all do the right thing.
> 
> One might argue that all tunnels should just call skb_orphan(skb) before
> transmitting the encapsulated packet into the new socket. But tunnels do
> *not* do this -- and this is wisely avoided in skb_scrub_packet() too --
> because features like TSQ rely on skb->destructor() being called when
> that buffer space is truely available again. Calling skb_orphan(skb) too
> early would result in buffers filling up unnecessarily and accounting
> info being all wrong. Instead, additional routing must take into account
> the new sk, just as __ip_queue_xmit() notes.
> 
> So, this commit addresses the problem by fishing the correct sk out of
> state->sk -- it's already set properly in the call to nf_hook() in
> __ip_local_out(), which receives the sk as part of its normal
> functionality. So we make sure to plumb state->sk through the various
> route_me_harder functions, and then make correct use of it following the
> example of __ip_queue_xmit().
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
> Reviewed-by: Florian Westphal <fw@xxxxxxxxx>
> Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
> [Jason: backported to 4.19 from Sasha's 5.4 backport]
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>

Now queued up, thanks!

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux