> -----Original Message----- > From: Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx> > Sent: Wednesday, 11 January 2023 14:54 > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>; netfilter-devel@xxxxxxxxxxxxxxx; > Florian Westphal <fw@xxxxxxxxx>; Long Xin <lxin@xxxxxxxxxx>; Claudio > Porfiri <claudio.porfiri@xxxxxxxxxxxx> > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state machine > > On Wed, Jan 11, 2023 at 09:36:38AM +0000, Sriram Yagnaraman wrote: > > > -----Original Message----- > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > Sent: Friday, 6 January 2023 01:50 > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; Long > > > Xin <lxin@xxxxxxxxxx>; Claudio Porfiri > > > <claudio.porfiri@xxxxxxxxxxxx> > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp state > > > machine > > > > > > On Thu, Jan 05, 2023 at 12:11:44PM +0000, Sriram Yagnaraman wrote: > > > > > -----Original Message----- > > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > > Sent: Thursday, 5 January 2023 12:54 > > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>; > > > > > Long Xin <lxin@xxxxxxxxxx>; Claudio Porfiri > > > > > <claudio.porfiri@xxxxxxxxxxxx> > > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp > > > > > state machine > > > > > > > > > > On Thu, Jan 05, 2023 at 11:41:13AM +0000, Sriram Yagnaraman wrote: > > > > > > > -----Original Message----- > > > > > > > From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > > > > > > Sent: Wednesday, 4 January 2023 16:02 > > > > > > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > > > > > > > Cc: netfilter-devel@xxxxxxxxxxxxxxx; Florian Westphal > > > > > > > <fw@xxxxxxxxx>; Marcelo Ricardo Leitner > > > > > > > <mleitner@xxxxxxxxxx>; Long Xin <lxin@xxxxxxxxxx> > > > > > > > Subject: Re: [RFC PATCH] netfilter: conntrack: simplify sctp > > > > > > > state machine > > > > > > > > > > > > > > On Wed, Jan 04, 2023 at 12:31:43PM +0100, Sriram Yagnaraman > > > wrote: > > > > > > > > All the paths in an SCTP connection are kept alive either > > > > > > > > by actual DATA/SACK running through the connection or by > HEARTBEAT. > > > > > > > > This patch proposes a simple state machine with only two > > > > > > > > states OPEN_WAIT and ESTABLISHED (similar to UDP). The > > > > > > > > reason for this change is a full stateful approach to SCTP > > > > > > > > is difficult when the association is multihomed since the > > > > > > > > endpoints could use different paths in the network during > > > > > > > > the lifetime > > > of an association. > > > > > > > > > > > > > > Do you mean the router/firewall might not see all packets > > > > > > > for association is multihomed? > > > > > > > > > > > > > > Could you please provide an example? > > > > > > > > > > > > Let's say the primary and alternate/secondary paths between > > > > > > the SCTP endpoints traverse different middle boxes. If an SCTP > > > > > > endpoint detects network failure on the primary path, it will > > > > > > switch to using the secondary path and all subsequent packets > > > > > > will not be seen by the middlebox on the primary path, > > > > > > including SHUTDOWN sequences if they happen at that time. > > > > > > > > > > OK, then on the primary middle box the SCTP flow will just timeout? > > > > > (because no more packets are seen). > > > > > > > > Yes, they will timeout unless the primary path comes up before the > > > > SHUTDOWN sequence. And the default timeout for an ESTABLISHED > SCTP > > > > connection is 5 days, which is a "long" time to clean-up this entry. > > > > > > Does the middle box have a chance to see any packet that provides a > > > hint to shorten this timeout? no HEARTBEAT packets are seen in this > > > case on the former primary path? > > > > There will be HEARTBEAT as soon as a path becomes unreachable from the > SCTP endpoints. But depending on the location of the network failure, the > middlebox may or may not see the HEARTBEAT. Also, HEARTBEAT is sent > when there are no data to be transmitted or if the path is > unreachable/unconfirmed, so I think there is no deterministic way of finding > out when to shorten the timeout. Of course, a user has the option of setting > the ESTABLISHED state timeout to a more reasonable value, for e.g., same as > the HEARTBEAT_ACKED state timeout (210 sec), OR we could reduce the > default timeout of ESTABLISHED to 210 sec. > > > > > > > > What I am missing are a more detailed list of issues with the > > > existing approach. Your patch description says "SCTP tracking with > > > multihoming is difficult", probably a list of scenarios would help > > > to understand the motivation to simplify the state machine. > > > > Thank you for reviewing and asking these questions, it made me step > > back and think. I list below some background > > - I want to simplify the state machine, because it is possible to > > track an SCTP connection with fewer states, for e.g., SCTP with UDP > > encapsulation uses UDP conntrack with just UNREPLIED/REPLIED states > > and it works perfectly fine > > [*] (more below) > > > - My stakeholders, at the behest of whom I am proposing these changes > > hit some problems running SCTP client endpoints behind NAT (inside > > Kubernetes pods) towards multihomed SCTP server endpoints (1a-g) and > > (2a-c) below > > - Some upcoming SCTP protocol changes in IETF (if > > approved/implemented) will make it hard to read beyond the SCTP common > > header, for e.g., DTLS over SCTP > > https://datatracker.ietf.org/doc/draft-ietf-tsvwg-dtls-over-sctp-bis/, > > proposes to encrypt all SCTP chunks, conntrack will only be able to > > see SCTP common header, these changes hopefully will make it easier to > > adapt to such changes in SCTP protocol > > If this is really the case, then I'm confused by this new RFC (be warned I don't > know the gory details of DTLS). I don't see it specifying that other chunk > headers will be encrypted, but instead I see texts like: > > 3.7. Message Sizes > > ... > The sequence of DTLS records is then fragmented into DATA or I-DATA > Chunks to fit the path MTU by SCTP. These changes ensure that DTLS/ > SCTP has the same capability as SCTP to support user messages of any > size. ... > > this is just packing encrypted payload. > > Sections 4.1 and 4.5 also lead to that, and: > > 6. DTLS over SCTP Service > > The adoption of DTLS over SCTP according to the current specification > is meant to add to SCTP the option for transferring encrypted data. > When DTLS over SCTP is used, all data being transferred MUST be > protected by chunk authentication and DTLS encrypted. > > and finally: > > 9.6. Privacy Considerations > > For each SCTP user message, the user also provides a stream > identifier, a flag to indicate whether the message is sent ordered or > unordered, and a payload protocol identifier. Although DTLS/SCTP > provides privacy for the actual user message, the other three > information fields are not confidentiality protected. They are sent > as clear text because they are part of the SCTP DATA chunk header. > > > So AFAICU the claim here is that a middle box wouldn't be able to inspect > SCTP payload, but the protocol itself is still possible. > Did I miss something maybe? Okay, my bad for using somebody else's words instead of reading the draft myself. You are right, the proposal is only to apply SCTP-AUTH to all the chunks, it doesn’t propose encryption/confidentiality. Anyhow that was just an example of why making a simplified state machine will avoid changing conntrack when the protocol changes are made, especially with the protocol still being actively improved. > > > - While at it, I also made some other "improvements" > > a) Avoid multiple walk-throughs of SCTP chunks in sctp_new(), > sctp_basic_checks() and nf_conntrack_sctp_packet(), and parse it only once > > b) SCTP conntrack has the same state regardless of it is a primary or > > a secondary path > > This a good change and may, actually, be the core reasoning behind the > change here. 'Primary' is more in the sense of 'active' than anything else (and > it gets more confusing with CMT-SCTP heh). It's not because a path saw the > INIT handshake that it is more reliable and so from conntrack side. > > With that, conntrack handling all paths similarly makes it more consistent and > closer with what the actual SCTP sockets are seeing for such paths. > Yes, I agree. > One easy example here of a bad situation due to the current difference > between paths is that an application can start an association through a path > and tear it down over another one. There's no shutdown seen by the first > path, and the conntrack entry will be left there af it is was in Established state > for 5 days. (this is also illustrated in Sriram's example below, just easier to > read) > > Item I marked with [*] above is vague. You need to consider what it would be > loosing by such simplification and argument that it is okay. > We can't just cut the fat out and pray for the best later on, right? > But with this description here, it not only provides a solid justification on why > it (specificly the new solution) is needed but also why it is currently already > broken/not reliable in multi-homing setups. Yay! :-) Agreed, I will update the change log with this description. > > Then, next question is: would it make sense to keep the current processing for > single path associations? That's a lot of code/maintenance and I fail to see a > reasoning that justifies it. I tried to simplify as much as possible, but of course, we can keep some states. For e.g., keeping SHUTDOWN_* states can reduce the timeout of an ESTABLISHED connection even further. Of course, we will see that only on one of the paths. > > > > > Let's say there are two SCTP endpoints A and B with addresses A' and B, B'' > correspondingly. > > Primary path is A' <----> B' that traverses middlebox C, and secondary path is > A' <----> B'' that traverses middlebox D. > > 1) SHUTDOWN sent on secondary path > > 1a) SCTP endpoint A sets up an association towards SCTP endpoint B > > 1b) Middlebox C sees INIT sequence and creates "primary" conntrack > > entry (5 days) > > 1c) Middlebox D sees HEARTBEAT sequence and creates "secondary" > > conntrack entry (210 seconds) > > 1d) Path failure between A and C, and SCTP endpoint A switches to > > secondary path and continues sending data on the association > > 1e) SCTP endpoint A decides to SHUTDOWN the connection > > 1f) Middlebox C is in ESTABLISHED state, doesn't see any SHUTDOWN > > sequence or HEARTBEAT, waits for timeout (5 days) > > 1g) Middlebox D is in HEARTBEAT_ACKED state, doesn't care about > > SHUTDOWN sequence, waits for timeout (210 seconds) > > > > 2) Recently fixed by bff3d0534804 ("netfilter: conntrack: add sctp > > DATA_SENT state ") > > 2a) SCTP endpoint A sets up an association towards SCTP endpoint B > > 2b) Middlebox C sees INIT sequence and creates "primary" conntrack > > entry (5 days) > > 2c) Middlebox D sees DATA/SACK, and DROPS packets until HEARTBEAT is > > seen to setup "secondary" conntrack entry (210 seconds) > > > > > > > > Thanks.