On Wed, Jan 11, 2023 at 09:36:38AM +0000, Sriram Yagnaraman wrote: > > -----Original Message----- > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > Sent: Friday, 6 January 2023 01:50 > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal <fw@xxxxxxxxx>; > > Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; Long Xin > > <lxin@xxxxxxxxxx>; Claudio Porfiri <claudio.porfiri@xxxxxxxxxxxx> > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state machine > > > > On Thu, Jan 05, 2023 at 12:11:44PM +0000, Sriram Yagnaraman wrote: > > > > -----Original Message----- > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > Sent: Thursday, 5 January 2023 12:54 > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; Long > > > > Xin <lxin@xxxxxxxxxx>; Claudio Porfiri > > > > <claudio.porfiri@xxxxxxxxxxxx> > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state > > > > machine > > > > > > > > On Thu, Jan 05, 2023 at 11:41:13AM +0000, Sriram Yagnaraman wrote: > > > > > > -----Original Message----- > > > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > > > Sent: Wednesday, 4 January 2023 16:02 > > > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; > > > > > > Long Xin <lxin@xxxxxxxxxx> > > > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp > > > > > > state machine > > > > > > > > > > > > On Wed, Jan 04, 2023 at 12:31:43PM +0100, Sriram Yagnaraman > > wrote: > > > > > > > All the paths in an SCTP connection are kept alive either by > > > > > > > actual DATA/SACK running through the connection or by HEARTBEAT. > > > > > > > This patch proposes a simple state machine with only two > > > > > > > states OPEN_WAIT and ESTABLISHED (similar to UDP). The reason > > > > > > > for this change is a full stateful approach to SCTP is > > > > > > > difficult when the association is multihomed since the > > > > > > > endpoints could use different paths in the network during the lifetime > > of an association. > > > > > > > > > > > > Do you mean the router/firewall might not see all packets for > > > > > > association is multihomed? > > > > > > > > > > > > Could you please provide an example? > > > > > > > > > > Let's say the primary and alternate/secondary paths between the > > > > > SCTP endpoints traverse different middle boxes. If an SCTP > > > > > endpoint detects network failure on the primary path, it will > > > > > switch to using the secondary path and all subsequent packets will > > > > > not be seen by the middlebox on the primary path, including > > > > > SHUTDOWN sequences if they happen at that time. > > > > > > > > OK, then on the primary middle box the SCTP flow will just timeout? > > > > (because no more packets are seen). > > > > > > Yes, they will timeout unless the primary path comes up before the > > > SHUTDOWN sequence. And the default timeout for an ESTABLISHED SCTP > > > connection is 5 days, which is a "long" time to clean-up this entry. > > > > Does the middle box have a chance to see any packet that provides a hint to > > shorten this timeout? no HEARTBEAT packets are seen in this case on the > > former primary path? > > There will be HEARTBEAT as soon as a path becomes unreachable from > the SCTP endpoints. But depending on the location of the network > failure, the middlebox may or may not see the HEARTBEAT. Conntrack assumes you have see all traffic that belongs the flow for other protocols too. > Also, HEARTBEAT is sent when there are no data to be transmitted or > if the path is unreachable/unconfirmed, so I think there is no > deterministic way of finding out when to shorten the timeout. Of > course, a user has the option of setting the ESTABLISHED state > timeout to a more reasonable value, for e.g., same as the > HEARTBEAT_ACKED state timeout (210 sec), OR we could reduce the > default timeout of ESTABLISHED to 210 sec. Then just set up a short ESTABLISHED when multihoming is in place since the beginning. > > What I am missing are a more detailed list of issues with the existing > > approach. Your patch description says "SCTP tracking with multihoming is > > difficult", probably a list of scenarios would help to understand the motivation > > to simplify the state machine. > > Thank you for reviewing and asking these questions, it made me step back and think. I list below some background > - I want to simplify the state machine, because it is possible to > track an SCTP connection with fewer states, for e.g., SCTP with UDP > encapsulation uses UDP conntrack with just UNREPLIED/REPLIED states > and it works perfectly fine I think it would preferrable to add some configuration via ruleset to track SCTP over UDP, rather than deranking SCTP to become almost stateless. > - My stakeholders, at the behest of whom I am proposing these > changes hit some problems running SCTP client endpoints behind NAT > (inside Kubernetes pods) towards multihomed SCTP server endpoints > (1a-g) and (2a-c) below > - Some upcoming SCTP protocol changes in IETF (if > approved/implemented) will make it hard to read beyond the SCTP > common header, for e.g., DTLS over SCTP > https://datatracker.ietf.org/doc/draft-ietf-tsvwg-dtls-over-sctp-bis/, > proposes to encrypt all SCTP chunks, conntrack will only be able to > see SCTP common header, these changes hopefully will make it easier > to adapt to such changes in SCTP protocol - While at it, I also made > some other "improvements" For this DTLS case it should be possible to fall back to the SCTP "stateless" approach. > a) Avoid multiple walk-throughs of SCTP chunks in sctp_new(), sctp_basic_checks() and nf_conntrack_sctp_packet(), and parse it only once > b) SCTP conntrack has the same state regardless of it is a primary or a secondary path > > Let's say there are two SCTP endpoints A and B with addresses A' and B, B'' correspondingly. > Primary path is A' <----> B' that traverses middlebox C, and secondary path is A' <----> B'' that traverses middlebox D. > 1) SHUTDOWN sent on secondary path > 1a) SCTP endpoint A sets up an association towards SCTP endpoint B > 1b) Middlebox C sees INIT sequence and creates "primary" conntrack entry (5 days) > 1c) Middlebox D sees HEARTBEAT sequence and creates "secondary" conntrack entry (210 seconds) > 1d) Path failure between A and C, and SCTP endpoint A switches to secondary path and continues sending data on the association > 1e) SCTP endpoint A decides to SHUTDOWN the connection > 1f) Middlebox C is in ESTABLISHED state, doesn't see any SHUTDOWN sequence or HEARTBEAT, waits for timeout (5 days) > 1g) Middlebox D is in HEARTBEAT_ACKED state, doesn't care about SHUTDOWN sequence, waits for timeout (210 seconds) I guess similar problem will occur with MP-TCP, and I am not sure taking TCP to be more stateless is the way to address this. > 2) Recently fixed by bff3d0534804 ("netfilter: conntrack: add sctp DATA_SENT state ") > 2a) SCTP endpoint A sets up an association towards SCTP endpoint B > 2b) Middlebox C sees INIT sequence and creates "primary" conntrack entry (5 days) > 2c) Middlebox D sees DATA/SACK, and DROPS packets until HEARTBEAT is seen to setup "secondary" conntrack entry (210 seconds) I assume this is already fixed. Another possibility would be to introduce this alternative state-machine and use it for multihoming?