On Thu, 7 Mar 2024, Jason Xing wrote: > On Thu, Mar 7, 2024 at 10:10 PM Florian Westphal <fw@xxxxxxxxx> wrote: > > > > Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@xxxxxxxxx> wrote: > > > > > > > > Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > > > > > This change disables most of the tcp_in_window() test, this will > > > > > > pretend everything is fine even though tcp_in_window says otherwise. > > > > > > > > > > Thanks for the information. It does make sense. > > > > > > > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl > > > > > knob which you also pointed out. It also pretends to ignore those > > > > > out-of-window skbs. > > > > > > > > > > > > > > > > > You could: > > > > > > - drop invalid tcp packets in input hook > > > > > > > > > > How about changing the return value only as below? Only two cases will > > > > > be handled: > > > > > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c > > > > > b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > index ae493599a3ef..c88ce4cd041e 100644 > > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > > > case NFCT_TCP_INVALID: > > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > > > spin_unlock_bh(&ct->lock); > > > > > - return -NF_ACCEPT; > > > > > + return -NF_DROP; > > > > > > > > Lets not do this. conntrack should never drop packets and defer to ruleset > > > > whereever possible. > > > > > > Hmm, sorry, it is against my understanding. > > > > > > If we cannot return -NF_DROP, why have we already added some 'return > > > NF_DROP' in the nf_conntrack_handle_packet() function? And why does > > > this test statement exist? > > > > Sure we can drop. But we should only do it if there is no better > > alternative. > > > > > nf_conntrack_in() > > > -> nf_conntrack_handle_packet() > > > -> if (ret <= 0) { > > > if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop); > > > > AFAICS this only happens when we receive syn for an existing conntrack > > that is being removed already so we'd expect next syn to create a new > > Sorry, I've double-checked this part and found out there is no chance > to return '-NF_DROP' for nf_conntrack_handle_packet(). It might return > 'NF_DROP' (see link [1]) instead. The if-else statements seem like > dead code. > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_conntrack_proto_tcp.c#:~:text=%2DNF_REPEAT%3B-,return%20NF_DROP%3B,-%7D%0A%09%09fallthrough%3B > > > connection. Feel free to send patches that replace drop with -accept > > where possible/where it makes sense, but I don't think the > > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided. > > Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in > nf_conntrack_dccp_packet()? > > There are three points where nf_conntrack_handle_packet() returns NF_DROP: > 1) one (syn_sent case) exists in nf_conntrack_tcp_packet(). As you > said, it's not necessary to change. > 2) another two exist in nf_conntrack_dccp_packet() which should be the > same as nf_conntrack_tcp_packet() handles. > > The patch goes like this: > diff --git a/net/netfilter/nf_conntrack_proto_dccp.c > b/net/netfilter/nf_conntrack_proto_dccp.c > index e2db1f4ec2df..ebc4f733bb2e 100644 > --- a/net/netfilter/nf_conntrack_proto_dccp.c > +++ b/net/netfilter/nf_conntrack_proto_dccp.c > @@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, > struct sk_buff *skb, > > dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh); > if (!dh) > - return NF_DROP; > + return -NF_ACCEPT; > > if (dccp_error(dh, skb, dataoff, state)) > return -NF_ACCEPT; > @@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, > struct sk_buff *skb, > /* pull again, including possible 48 bit sequences and subtype header */ > dh = dccp_header_pointer(skb, dataoff, dh, &_dh); > if (!dh) > - return NF_DROP; > + return -NF_ACCEPT; > > type = dh->dccph_type; > if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state)) > > > > > > My only purpose is not to let the TCP layer sending strange RST to the > > > right flow. > > > > AFAIU tcp layer is correct, no? Out of the blue packet to some listener > > socket? > > Allow me to finish the full sentence: my only purpose is not to let > the TCP layer send strange RST to the _established_ socket due to > receiving strange out-of-window skbs. I don't understand why do you want to modify conntrack at all: conntrack itself does not send RST packets. And the TCP layer don't send RST packets to out of window ones either. The only possibility I see for such packets is an iptables/nftables rule which rejects packets classified as INVALID by conntrack. As Florian suggested, why don't you change that rule? The conntrack states are not fine-grained to express different TCP states which covered with INVALID. It was never a good idea to reject INVALID packets or let them through (leaking internal addresses). Best regards, Jozsef > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > > > knob seems odd to me though it can workaround :S > > > > I don't see a better alternative, other than -p tcp -m conntrack > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see > > such packets. > > > > > I would like to prevent sending such an RST as default behaviour. > > > > I don't see a way to make this work out of the box, without possible > > unwanted side effects. > > > > MAYBE we could drop IFF we check that the conntrack entry candidate > > that fails sequence validation has NAT translation applied to it, and > > thus the '-NF_ACCEPT' packet won't be translated. > > > > Not even compile tested: > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > case NFCT_TCP_IGNORE: > > spin_unlock_bh(&ct->lock); > > return NF_ACCEPT; > > - case NFCT_TCP_INVALID: > > + case NFCT_TCP_INVALID: { > > + verdict = -NF_ACCEPT; > > + if (ct->status & IPS_NAT_MASK) > > + res = NF_DROP; /* skb would miss nat transformation */ > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb > would be dropped in nf_hook_slow() eventually and would not be passed > to the TCP layer. > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > spin_unlock_bh(&ct->lock); > > - return -NF_ACCEPT; > > + return verdict; > > + } > > case NFCT_TCP_ACCEPT: > > break; > > } > > Great! I think your draft patch makes sense really, which takes NAT > into consideration. > > > > > But I don't really see the advantage compared to doing drop decision in > > iptables/nftables ruleset. > > From our views, especially to kernel developers, you're right: we > could easily turn on that knob or add a drop policy to prevent it > happening. Actually I did this in production to prevent such a case. > It surely works. > > But from the views of normal users and those who do not understand how > it works in the kernel, it looks strange: people may ask why we get > some unknown RSTs in flight? > > > I also have a hunch that someone will eventually complain about this > > change in behavior. > > Well, I still think the patch you suggested is proper and don't know > why people could complain about it. > > Thanks for your patience :) > > Thanks, > Jason > -- E-mail : kadlec@xxxxxxxxxxxxxxxxx, kadlecsik.jozsef@xxxxxxxxx PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary