Hello Pablo, On Fri, Mar 22, 2024 at 5:06 AM Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > On Mon, Mar 11, 2024 at 03:05:50PM +0800, Jason Xing wrote: > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > Supposing we set DNAT policy converting a_port to b_port on the > > server at the beginning, the socket is set up by using 4-tuple: > > > > client_ip:client_port <--> server_ip:b_port > > > > Then, some strange skbs from client or gateway, say, out-of-window > > skbs are eventually sent to the server_ip:a_port (not b_port) > > in TCP layer due to netfilter clearing skb->_nfct value in > > nf_conntrack_in() function. Why? Because the tcp_in_window() > > considers the incoming skb as an invalid skb by returning > > NFCT_TCP_INVALID. > > > > At last, the TCP layer process the out-of-window > > skb (client_ip,client_port,server_ip,a_port) and try to look up > > such an socket in tcp_v4_rcv(), as we can see, it will fail for sure > > because the port is a_port not our expected b_port and then send > > back an RST to the client. > > > > The detailed call graphs go like this: > > 1) > > nf_conntrack_in() > > -> nf_conntrack_handle_packet() > > -> nf_conntrack_tcp_packet() > > -> tcp_in_window() // tests if the skb is out-of-window > > -> return -NF_ACCEPT; > > -> skb->_nfct = 0; // if the above line returns a negative value > > 2) > > tcp_v4_rcv() > > -> __inet_lookup_skb() // fails, then jump to no_tcp_socket > > -> tcp_v4_send_reset() > > > > The moment the client receives the RST, it will drop. So the RST > > skb doesn't hurt the client (maybe hurt some gateway which cancels > > the session when filtering the RST without validating > > the sequence because of performance reason). Well, it doesn't > > matter. However, we can see many strange RST in flight. > > > > The key reason why I wrote this patch is that I don't think > > the behaviour is expected because the RFC 793 defines this > > case: > > > > "If the connection is in a synchronized state (ESTABLISHED, > > FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), > > any unacceptable segment (out of window sequence number or > > unacceptible acknowledgment number) must elicit only an empty > > acknowledgment segment containing the current send-sequence number > > and an acknowledgment..." > > > > I think, even we have set DNAT policy, it would be better if the > > whole process/behaviour adheres to the original TCP behaviour as > > default. > > > > Suggested-by: Florian Westphal <fw@xxxxxxxxx> > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > > --- > > v2 > > Link: https://lore.kernel.org/netdev/20240307090732.56708-1-kerneljasonxing@xxxxxxxxx/ > > 1. add one more test about NAT and then drop the skb (Florian) > > --- > > net/netfilter/nf_conntrack_proto_tcp.c | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > index ae493599a3ef..19ddac526ea0 100644 > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > @@ -1256,10 +1256,21 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > case NFCT_TCP_IGNORE: > > spin_unlock_bh(&ct->lock); > > return NF_ACCEPT; > > - case NFCT_TCP_INVALID: > > + case NFCT_TCP_INVALID: { > > + int verdict = -NF_ACCEPT; > > + > > + if (ct->status & IPS_NAT_MASK) > > + /* If DNAT is enabled and netfilter receives > > + * out-of-window skbs, we should drop it directly, > > Yes, if _be_liberal toggle is disabled this can happen. > > > + * or else skb would miss NAT transformation and > > + * trigger corresponding RST sending to the flow > > + * in TCP layer, which is not supposed to happen. > > + */ > > + verdict = NF_DROP; > > One comment for the SNAT case. Thanks for the comment :) > > nf_conntrack_in() calls this function from the prerouting hook. For > the very first packet, IPS_NAT_MASK might not be yet fully set on > (masquerade/snat happens in postrouting), then still one packet can be > leaked without NAT mangling in the SNAT case. It's possible if the flag is not set and out-of-window skb comes first... > > Rulesets should really need to set default policy to drop in NAT > chains to address this. > > And after this update, user has no chance anymore to bump counters at > the end of the policy, to debug issues. You mean 'set default policy' is using iptables command to set, right? If that's the case, I suspect the word "address" because it just hides the issue and not lets people see it. I think many users don't know this case. If I tell them about this "just set one more sysctl knob and you'll be fine", they will definitely question me... Actually I was questioned many times last week. We have a _be_liberal sysctl knob to "address" this, yes, but what I'm thinking is : the less we resort to sysctl knob, the easier life we have. It's very normal to drop an out-of-window skb without S/DNAT enabled. Naturally, we're supposed to drop it finally with S/DNAT enabled. It can be the default behaviour. Why would we use a knob to do it instead? :/ > > We have relied on the rule that "conntrack should not drop packets" > since the very beginning, instead signal rulesets that something is > invalid, so user decides what to do. Yes, I know that rule, but we already have some exceptions for this: we dropped the unexpected skb in the netfilter unless there are no other better alternatives. My logic in the V1 patch is not setting invalid (in order to not clear skb->_nfct field) and letting it go until it is passed to the TCP layer which will drop it finally. > > I'm ambivalent about this, Jozsef? Hope to see more comments and suggestions from you two maintainers :) Thanks, Jason > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > spin_unlock_bh(&ct->lock); > > - return -NF_ACCEPT; > > + return verdict; > > + } > > case NFCT_TCP_ACCEPT: > > break; > > } > > -- > > 2.37.3 > >