On Thu, Nov 03, 2022 at 08:02:08PM +0000, Sriram Yagnaraman wrote: > On 2022-11-02 15:00, Florian Westphal wrote: > > > Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> wrote: > >> On 2022-10-31 09:38, Florian Westphal wrote: > >> > >>> sriram.yagnaraman@xxxxxxxx <sriram.yagnaraman@xxxxxxxx> wrote: > >>>> From: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx> > >>>> > >>>> This patch introduces a new proc entry to disable source port > >>>> randomization for SCTP connections. > >>> Hmm. Can you elaborate? The sport is never randomized, unless either > >>> 1. User explicitly requested it via "random" flag passed to snat rule, or > >>> 2. the is an existing connection, using the *same* sport:saddr -> daddr:dport > >>> quadruple as the new request. > >>> > >>> In 2), this new toggle prevents communication. So I wonder why ... > >> Thank you so much for the detailed review comments. > >> > >> My use case for this flag originates from a deployment of SCTP client > >> endpoints on docker/kubernetes environments, where typically there exists > >> SNAT rules for the endpoints on egress. The *user* in this case are the > >> CNI plugins that configure the SNAT rules, and some of the most common > >> plugins use --random-fully regardless of the protocol. > >> > >> Consider an SCTP association A -> B, which has two paths via NAT A and B > >> A: 1.2.3.4:12345 > >> B: 5.6.7.8/9:42 > >> NAT A: 1.2.31.4 (used for path towards 5.6.7.8) > >> NAT B: 1.2.32.4 (used for path towards 5.6.7.9) > >> > >> ┌───────┐ ┌───┐ > >> ┌──► NAT A ├───► │ > >> ┌─────┐ │ └───────┘ │ │ > >> │ A ├───┤ │ B │ > >> └─────┘ │ ┌───────┐ │ │ > >> └──► NAT B ├───► │ > >> └───────┘ └───┘ > >> > >> Let us assume in NAT A (1.2.31.4), the connections is setup as > >> ORIGINAL TUPLE REPLY TUPLE > >> 1.2.3.4:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.31.4:33333 > >> > >> Let us assume in NAT B (1.2.32.4), the connections is setup as > >> ORIGINAL TUPLE REPLY TUPLE > >> 1.2.3.4:12345 -> 5.6.7.9:42, 5.6.7.8.42 -> 1.2.32.4:44444 > >> > >> Since the port numbers are different when viewed from B, the association > >> will not become multihomed, with only the primary path being active. > >> Moreover, on a NAT/middlebox restart, we will end up getting new ports. > >> > >> I understand this is a problem in the way SNAT rules are configured, my > >> proposal was to have this flag as a means of preventing such a problem > >> even if the user wanted to. > > Ugh, sorry, but that sounds just wrong. > > Ok, I hear that. :) > > > > >>>> As specified in RFC9260 all transport addresses used by an SCTP endpoint > >>>> MUST use the same port number but can use multiple IP addresses. That > >>>> means that all paths taken within an SCTP association should have the > >>>> same port even if they pass through different NAT/middleboxes in the > >>>> network. > > Hmm, I don't understand WHY this requirement exists, since endpoints > > cannot control source port (or source address) seen by the peer; > > NAT won't go away. > > > > I read that snippet several times and its not clear to me if > > "port number" refers to sport or dport. Dport would make sense to me, > > but sport...? No, not really. > > I am just an interpreter of the standard but AFAIU, port means both source > and destination port. Section 1.3 of RFC 9260 defining an SCTP endpoint. > In any case, running SCTP on UDP is probably the best way to workaround > the SCTP NAT problem. > > > > > Won't the endpoints notice that the path is down and re-create the flow? > > > > AFAIU the root cause of your problem is that: > > 1. NAT middleboxes remap source port AND > > 2. NAT middleboxes restart frequently > > > > ... so fixing either 1 or 2 would avoid the problem. > > > > I don't think adding sysctls to override 1) is a sane option. > > Yeah the endpoints does try to re-create the flows, but if we have > multiple middle boxes remapping the source port, there is no guarantee > that they will remap to the same source port. > 1) is the main problem that I was trying to address with this patch. > > >> Since the flag is optional, the idea is to enable it only on hosts that > >> are part of docker/kubernetes environments and use NAT in their datapath. > > We can't fix the ruleset but we can somehow cure it via sysctl in each netns? > > I don't like this. > > > > NAT middlebox restart with --random is a problem in any case, not just > > for SCTP, because the chosen "random port" is lost. > > > > I don't see a way to fix this, unless NOT using --random mode. > > If connection is subject to sequence number rewrite (for tcp) > > the connection won't survive either as the sejadj state is lost. > > Ok, I understand your point. I agree it doesn't make sense to have an > alternative configuration option to avoid this problem. I will try to > convince the "users" if --random-fully is not used for SCTP. FWIW I share Florian's opinion here. With the explanations above, it doesn't make sense to have an override in kernel for an option that userspace is supplying at will. Marcelo