On 2022-11-02 15:00, Florian Westphal wrote: > Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> wrote: >> On 2022-10-31 09:38, Florian Westphal wrote: >> >>> sriram.yagnaraman@xxxxxxxx <sriram.yagnaraman@xxxxxxxx> wrote: >>>> From: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> >>>> >>>> This patch introduces a new proc entry to disable source port >>>> randomization for SCTP connections. >>> Hmm. Can you elaborate? The sport is never randomized, unless either >>> 1. User explicitly requested it via "random" flag passed to snat rule, or >>> 2. the is an existing connection, using the *same* sport:saddr -> daddr:dport >>> quadruple as the new request. >>> >>> In 2), this new toggle prevents communication. So I wonder why ... >> Thank you so much for the detailed review comments. >> >> My use case for this flag originates from a deployment of SCTP client >> endpoints on docker/kubernetes environments, where typically there exists >> SNAT rules for the endpoints on egress. The *user* in this case are the >> CNI plugins that configure the SNAT rules, and some of the most common >> plugins use --random-fully regardless of the protocol. >> >> Consider an SCTP association A -> B, which has two paths via NAT A and B >> A: 1.2.3.4:12345 >> B: 5.6.7.8/9:42 >> NAT A: 1.2.31.4 (used for path towards 5.6.7.8) >> NAT B: 1.2.32.4 (used for path towards 5.6.7.9) >> >> ┌───────┐ ┌───┐ >> ┌──► NAT A ├───► │ >> ┌─────┐ │ └───────┘ │ │ >> │ A ├───┤ │ B │ >> └─────┘ │ ┌───────┐ │ │ >> └──► NAT B ├───► │ >> └───────┘ └───┘ >> >> Let us assume in NAT A (1.2.31.4), the connections is setup as >> ORIGINAL TUPLE REPLY TUPLE >> 1.2.3.4:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.31.4:33333 >> >> Let us assume in NAT B (1.2.32.4), the connections is setup as >> ORIGINAL TUPLE REPLY TUPLE >> 1.2.3.4:12345 -> 5.6.7.9:42, 5.6.7.8.42 -> 1.2.32.4:44444 >> >> Since the port numbers are different when viewed from B, the association >> will not become multihomed, with only the primary path being active. >> Moreover, on a NAT/middlebox restart, we will end up getting new ports. >> >> I understand this is a problem in the way SNAT rules are configured, my >> proposal was to have this flag as a means of preventing such a problem >> even if the user wanted to. > Ugh, sorry, but that sounds just wrong. Ok, I hear that. :) > >>>> As specified in RFC9260 all transport addresses used by an SCTP endpoint >>>> MUST use the same port number but can use multiple IP addresses. That >>>> means that all paths taken within an SCTP association should have the >>>> same port even if they pass through different NAT/middleboxes in the >>>> network. > Hmm, I don't understand WHY this requirement exists, since endpoints > cannot control source port (or source address) seen by the peer; > NAT won't go away. > > I read that snippet several times and its not clear to me if > "port number" refers to sport or dport. Dport would make sense to me, > but sport...? No, not really. I am just an interpreter of the standard but AFAIU, port means both source and destination port. Section 1.3 of RFC 9260 defining an SCTP endpoint. In any case, running SCTP on UDP is probably the best way to workaround the SCTP NAT problem. > > Won't the endpoints notice that the path is down and re-create the flow? > > AFAIU the root cause of your problem is that: > 1. NAT middleboxes remap source port AND > 2. NAT middleboxes restart frequently > > ... so fixing either 1 or 2 would avoid the problem. > > I don't think adding sysctls to override 1) is a sane option. Yeah the endpoints does try to re-create the flows, but if we have multiple middle boxes remapping the source port, there is no guarantee that they will remap to the same source port. 1) is the main problem that I was trying to address with this patch. >> Since the flag is optional, the idea is to enable it only on hosts that >> are part of docker/kubernetes environments and use NAT in their datapath. > We can't fix the ruleset but we can somehow cure it via sysctl in each netns? > I don't like this. > > NAT middlebox restart with --random is a problem in any case, not just > for SCTP, because the chosen "random port" is lost. > > I don't see a way to fix this, unless NOT using --random mode. > If connection is subject to sequence number rewrite (for tcp) > the connection won't survive either as the sejadj state is lost. Ok, I understand your point. I agree it doesn't make sense to have an alternative configuration option to avoid this problem. I will try to convince the "users" if --random-fully is not used for SCTP.