> -----Original Message----- > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > Sent: Thursday, 12 January 2023 12:50 > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal <fw@xxxxxxxxx>; > Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; Long Xin > <lxin@xxxxxxxxxx>; Claudio Porfiri <claudio.porfiri@xxxxxxxxxxxx> > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state machine > > On Wed, Jan 11, 2023 at 09:36:38AM +0000, Sriram Yagnaraman wrote: > > > -----Original Message----- > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > Sent: Friday, 6 January 2023 01:50 > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; Long > > > Xin <lxin@xxxxxxxxxx>; Claudio Porfiri > > > <claudio.porfiri@xxxxxxxxxxxx> > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state > > > machine > > > > > > On Thu, Jan 05, 2023 at 12:11:44PM +0000, Sriram Yagnaraman wrote: > > > > > -----Original Message----- > > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > > Sent: Thursday, 5 January 2023 12:54 > > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; > > > > > Long Xin <lxin@xxxxxxxxxx>; Claudio Porfiri > > > > > <claudio.porfiri@xxxxxxxxxxxx> > > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp > > > > > state machine > > > > > > > > > > On Thu, Jan 05, 2023 at 11:41:13AM +0000, Sriram Yagnaraman wrote: > > > > > > > -----Original Message----- > > > > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > > > > Sent: Wednesday, 4 January 2023 16:02 > > > > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner > > > > > > > <mleitner@xxxxxxxxxx>; Long Xin <lxin@xxxxxxxxxx> > > > > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp > > > > > > > state machine > > > > > > > > > > > > > > On Wed, Jan 04, 2023 at 12:31:43PM +0100, Sriram Yagnaraman > > > wrote: > > > > > > > > All the paths in an SCTP connection are kept alive either > > > > > > > > by actual DATA/SACK running through the connection or by > HEARTBEAT. > > > > > > > > This patch proposes a simple state machine with only two > > > > > > > > states OPEN_WAIT and ESTABLISHED (similar to UDP). The > > > > > > > > reason for this change is a full stateful approach to SCTP > > > > > > > > is difficult when the association is multihomed since the > > > > > > > > endpoints could use different paths in the network during > > > > > > > > the lifetime > > > of an association. > > > > > > > > > > > > > > Do you mean the router/firewall might not see all packets > > > > > > > for association is multihomed? > > > > > > > > > > > > > > Could you please provide an example? > > > > > > > > > > > > Let's say the primary and alternate/secondary paths between > > > > > > the SCTP endpoints traverse different middle boxes. If an SCTP > > > > > > endpoint detects network failure on the primary path, it will > > > > > > switch to using the secondary path and all subsequent packets > > > > > > will not be seen by the middlebox on the primary path, > > > > > > including SHUTDOWN sequences if they happen at that time. > > > > > > > > > > OK, then on the primary middle box the SCTP flow will just timeout? > > > > > (because no more packets are seen). > > > > > > > > Yes, they will timeout unless the primary path comes up before the > > > > SHUTDOWN sequence. And the default timeout for an ESTABLISHED > SCTP > > > > connection is 5 days, which is a "long" time to clean-up this entry. > > > > > > Does the middle box have a chance to see any packet that provides a > > > hint to shorten this timeout? no HEARTBEAT packets are seen in this > > > case on the former primary path? > > > > There will be HEARTBEAT as soon as a path becomes unreachable from the > > SCTP endpoints. But depending on the location of the network failure, > > the middlebox may or may not see the HEARTBEAT. > > Conntrack assumes you have see all traffic that belongs the flow for other > protocols too. > > > Also, HEARTBEAT is sent when there are no data to be transmitted or if > > the path is unreachable/unconfirmed, so I think there is no > > deterministic way of finding out when to shorten the timeout. Of > > course, a user has the option of setting the ESTABLISHED state timeout > > to a more reasonable value, for e.g., same as the HEARTBEAT_ACKED > > state timeout (210 sec), OR we could reduce the default timeout of > > ESTABLISHED to 210 sec. > > Then just set up a short ESTABLISHED when multihoming is in place since the > beginning. > > > > What I am missing are a more detailed list of issues with the > > > existing approach. Your patch description says "SCTP tracking with > > > multihoming is difficult", probably a list of scenarios would help > > > to understand the motivation to simplify the state machine. > > > > Thank you for reviewing and asking these questions, it made me step > > back and think. I list below some background > > - I want to simplify the state machine, because it is possible to > > track an SCTP connection with fewer states, for e.g., SCTP with UDP > > encapsulation uses UDP conntrack with just UNREPLIED/REPLIED states > > and it works perfectly fine > > I think it would preferrable to add some configuration via ruleset to track SCTP > over UDP, rather than deranking SCTP to become almost stateless. Okay 😊 > > > - My stakeholders, at the behest of whom I am proposing these changes > > hit some problems running SCTP client endpoints behind NAT (inside > > Kubernetes pods) towards multihomed SCTP server endpoints > > (1a-g) and (2a-c) below > > - Some upcoming SCTP protocol changes in IETF (if > > approved/implemented) will make it hard to read beyond the SCTP common > > header, for e.g., DTLS over SCTP > > https://datatracker.ietf.org/doc/draft-ietf-tsvwg-dtls-over-sctp-bis/, > > proposes to encrypt all SCTP chunks, conntrack will only be able to > > see SCTP common header, these changes hopefully will make it easier to > > adapt to such changes in SCTP protocol - While at it, I also made some > > other "improvements" > > For this DTLS case it should be possible to fall back to the SCTP "stateless" > approach. > > > a) Avoid multiple walk-throughs of SCTP chunks in sctp_new(), > sctp_basic_checks() and nf_conntrack_sctp_packet(), and parse it only once > > b) SCTP conntrack has the same state regardless of it is a primary or > > a secondary path > > > > Let's say there are two SCTP endpoints A and B with addresses A' and B, B'' > correspondingly. > > Primary path is A' <----> B' that traverses middlebox C, and secondary path is > A' <----> B'' that traverses middlebox D. > > 1) SHUTDOWN sent on secondary path > > 1a) SCTP endpoint A sets up an association towards SCTP endpoint B > > 1b) Middlebox C sees INIT sequence and creates "primary" conntrack > > entry (5 days) > > 1c) Middlebox D sees HEARTBEAT sequence and creates "secondary" > > conntrack entry (210 seconds) > > 1d) Path failure between A and C, and SCTP endpoint A switches to > > secondary path and continues sending data on the association > > 1e) SCTP endpoint A decides to SHUTDOWN the connection > > 1f) Middlebox C is in ESTABLISHED state, doesn't see any SHUTDOWN > > sequence or HEARTBEAT, waits for timeout (5 days) > > 1g) Middlebox D is in HEARTBEAT_ACKED state, doesn't care about > > SHUTDOWN sequence, waits for timeout (210 seconds) > > I guess similar problem will occur with MP-TCP, and I am not sure taking TCP > to be more stateless is the way to address this. Ok, I am a newbie to this area and am most probably mistaken, so forgive my naive question below. Shouldn't conntrack understand as less as possible about the protocol, and parse the bare minimum from the packet to detect that an active connection? For packet filtering/firewall, I understand we will need deep packet inspection, but is conntrack the place to do that? > > > 2) Recently fixed by bff3d0534804 ("netfilter: conntrack: add sctp > > DATA_SENT state ") > > 2a) SCTP endpoint A sets up an association towards SCTP endpoint B > > 2b) Middlebox C sees INIT sequence and creates "primary" conntrack > > entry (5 days) > > 2c) Middlebox D sees DATA/SACK, and DROPS packets until HEARTBEAT is > > seen to setup "secondary" conntrack entry (210 seconds) > > I assume this is already fixed. > > Another possibility would be to introduce this alternative state-machine and > use it for multihoming? Or I could unify the established states for both the connection that saw an INIT/INIT_ACK sequence and HEARTBEAT/HEARTBEAT_ACK sequence and use the HEARTBEAT_ACKED state timeout for both. That way, there is no difference from a conntrack perspective between "primary" and "secondary" connections. I can send another patch if the group here thinks this is a good idea.