On Wed, Jan 11, 2023 at 09:36:38AM +0000, Sriram Yagnaraman wrote: > > -----Original Message----- > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > Sent: Friday, 6 January 2023 01:50 > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal <fw@xxxxxxxxx>; > > Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; Long Xin > > <lxin@xxxxxxxxxx>; Claudio Porfiri <claudio.porfiri@xxxxxxxxxxxx> > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state machine > > > > On Thu, Jan 05, 2023 at 12:11:44PM +0000, Sriram Yagnaraman wrote: > > > > -----Original Message----- > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > Sent: Thursday, 5 January 2023 12:54 > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; Long > > > > Xin <lxin@xxxxxxxxxx>; Claudio Porfiri > > > > <claudio.porfiri@xxxxxxxxxxxx> > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state > > > > machine > > > > > > > > On Thu, Jan 05, 2023 at 11:41:13AM +0000, Sriram Yagnaraman wrote: > > > > > > -----Original Message----- > > > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > > > Sent: Wednesday, 4 January 2023 16:02 > > > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; > > > > > > Long Xin <lxin@xxxxxxxxxx> > > > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp > > > > > > state machine > > > > > > > > > > > > On Wed, Jan 04, 2023 at 12:31:43PM +0100, Sriram Yagnaraman > > wrote: > > > > > > > All the paths in an SCTP connection are kept alive either by > > > > > > > actual DATA/SACK running through the connection or by HEARTBEAT. > > > > > > > This patch proposes a simple state machine with only two > > > > > > > states OPEN_WAIT and ESTABLISHED (similar to UDP). The reason > > > > > > > for this change is a full stateful approach to SCTP is > > > > > > > difficult when the association is multihomed since the > > > > > > > endpoints could use different paths in the network during the lifetime > > of an association. > > > > > > > > > > > > Do you mean the router/firewall might not see all packets for > > > > > > association is multihomed? > > > > > > > > > > > > Could you please provide an example? > > > > > > > > > > Let's say the primary and alternate/secondary paths between the > > > > > SCTP endpoints traverse different middle boxes. If an SCTP > > > > > endpoint detects network failure on the primary path, it will > > > > > switch to using the secondary path and all subsequent packets will > > > > > not be seen by the middlebox on the primary path, including > > > > > SHUTDOWN sequences if they happen at that time. > > > > > > > > OK, then on the primary middle box the SCTP flow will just timeout? > > > > (because no more packets are seen). > > > > > > Yes, they will timeout unless the primary path comes up before the > > > SHUTDOWN sequence. And the default timeout for an ESTABLISHED SCTP > > > connection is 5 days, which is a "long" time to clean-up this entry. > > > > Does the middle box have a chance to see any packet that provides a hint to > > shorten this timeout? no HEARTBEAT packets are seen in this case on the > > former primary path? > > There will be HEARTBEAT as soon as a path becomes unreachable from the SCTP endpoints. But depending on the location of the network failure, the middlebox may or may not see the HEARTBEAT. Also, HEARTBEAT is sent when there are no data to be transmitted or if the path is unreachable/unconfirmed, so I think there is no deterministic way of finding out when to shorten the timeout. Of course, a user has the option of setting the ESTABLISHED state timeout to a more reasonable value, for e.g., same as the HEARTBEAT_ACKED state timeout (210 sec), OR we could reduce the default timeout of ESTABLISHED to 210 sec. > > > > > What I am missing are a more detailed list of issues with the existing > > approach. Your patch description says "SCTP tracking with multihoming is > > difficult", probably a list of scenarios would help to understand the motivation > > to simplify the state machine. > > Thank you for reviewing and asking these questions, it made me step back and think. I list below some background > - I want to simplify the state machine, because it is possible to track an SCTP connection with fewer states, for e.g., SCTP with UDP encapsulation uses UDP conntrack with just UNREPLIED/REPLIED states and it works perfectly fine [*] (more below) > - My stakeholders, at the behest of whom I am proposing these changes hit some problems running SCTP client endpoints behind NAT (inside Kubernetes pods) towards multihomed SCTP server endpoints (1a-g) and (2a-c) below > - Some upcoming SCTP protocol changes in IETF (if approved/implemented) will make it hard to read beyond the SCTP common header, for e.g., DTLS over SCTP https://datatracker.ietf.org/doc/draft-ietf-tsvwg-dtls-over-sctp-bis/, proposes to encrypt all SCTP chunks, conntrack will only be able to see SCTP common header, these changes hopefully will make it easier to adapt to such changes in SCTP protocol If this is really the case, then I'm confused by this new RFC (be warned I don't know the gory details of DTLS). I don't see it specifying that other chunk headers will be encrypted, but instead I see texts like: 3.7. Message Sizes ... The sequence of DTLS records is then fragmented into DATA or I-DATA Chunks to fit the path MTU by SCTP. These changes ensure that DTLS/ SCTP has the same capability as SCTP to support user messages of any size. ... this is just packing encrypted payload. Sections 4.1 and 4.5 also lead to that, and: 6. DTLS over SCTP Service The adoption of DTLS over SCTP according to the current specification is meant to add to SCTP the option for transferring encrypted data. When DTLS over SCTP is used, all data being transferred MUST be protected by chunk authentication and DTLS encrypted. and finally: 9.6. Privacy Considerations For each SCTP user message, the user also provides a stream identifier, a flag to indicate whether the message is sent ordered or unordered, and a payload protocol identifier. Although DTLS/SCTP provides privacy for the actual user message, the other three information fields are not confidentiality protected. They are sent as clear text because they are part of the SCTP DATA chunk header. So AFAICU the claim here is that a middle box wouldn't be able to inspect SCTP payload, but the protocol itself is still possible. Did I miss something maybe? > - While at it, I also made some other "improvements" > a) Avoid multiple walk-throughs of SCTP chunks in sctp_new(), sctp_basic_checks() and nf_conntrack_sctp_packet(), and parse it only once > b) SCTP conntrack has the same state regardless of it is a primary or a secondary path This a good change and may, actually, be the core reasoning behind the change here. 'Primary' is more in the sense of 'active' than anything else (and it gets more confusing with CMT-SCTP heh). It's not because a path saw the INIT handshake that it is more reliable and so from conntrack side. With that, conntrack handling all paths similarly makes it more consistent and closer with what the actual SCTP sockets are seeing for such paths. One easy example here of a bad situation due to the current difference between paths is that an application can start an association through a path and tear it down over another one. There's no shutdown seen by the first path, and the conntrack entry will be left there af it is was in Established state for 5 days. (this is also illustrated in Sriram's example below, just easier to read) Item I marked with [*] above is vague. You need to consider what it would be loosing by such simplification and argument that it is okay. We can't just cut the fat out and pray for the best later on, right? But with this description here, it not only provides a solid justification on why it (specificly the new solution) is needed but also why it is currently already broken/not reliable in multi-homing setups. Yay! :-) Then, next question is: would it make sense to keep the current processing for single path associations? That's a lot of code/maintenance and I fail to see a reasoning that justifies it. > > Let's say there are two SCTP endpoints A and B with addresses A' and B, B'' correspondingly. > Primary path is A' <----> B' that traverses middlebox C, and secondary path is A' <----> B'' that traverses middlebox D. > 1) SHUTDOWN sent on secondary path > 1a) SCTP endpoint A sets up an association towards SCTP endpoint B > 1b) Middlebox C sees INIT sequence and creates "primary" conntrack entry (5 days) > 1c) Middlebox D sees HEARTBEAT sequence and creates "secondary" conntrack entry (210 seconds) > 1d) Path failure between A and C, and SCTP endpoint A switches to secondary path and continues sending data on the association > 1e) SCTP endpoint A decides to SHUTDOWN the connection > 1f) Middlebox C is in ESTABLISHED state, doesn't see any SHUTDOWN sequence or HEARTBEAT, waits for timeout (5 days) > 1g) Middlebox D is in HEARTBEAT_ACKED state, doesn't care about SHUTDOWN sequence, waits for timeout (210 seconds) > > 2) Recently fixed by bff3d0534804 ("netfilter: conntrack: add sctp DATA_SENT state ") > 2a) SCTP endpoint A sets up an association towards SCTP endpoint B > 2b) Middlebox C sees INIT sequence and creates "primary" conntrack entry (5 days) > 2c) Middlebox D sees DATA/SACK, and DROPS packets until HEARTBEAT is seen to setup "secondary" conntrack entry (210 seconds) > > > > > Thanks.