On Wed, Jul 7, 2021 at 8:36 AM Timo Völker <timo.voelker@xxxxxxxxxxxxxx> wrote: > > > On 6. Jul 2021, at 18:01, Xin Long <lucien.xin@xxxxxxxxx> wrote: > > > > On Tue, Jul 6, 2021 at 5:13 AM Timo Völker <timo.voelker@xxxxxxxxxxxxxx> wrote: > >> > >> > >> Hi Xin, > >> > >> I implemented RFC8899 for an SCTP simulation model. > > great, can I know what that one is? > > I used the SCTP implementation in INET. INET is a simulation model suite for OMNeT++. Thanks. > > > > >> > >> Comments follow inline. > >> > >>> Begin forwarded message: > >>> > >>> From: Xin Long <lucien.xin@xxxxxxxxx> > >>> Subject: Re: The new sysctl and socket option added for PLPMTUD (RFC8899) > >>> Date: 12. June 2021 at 19:32:02 CEST > >>> To: Michael Tuexen <tuexen@xxxxxxxxxxx> > >>> Cc: "linux-sctp @ vger . kernel . org" <linux-sctp@xxxxxxxxxxxxxxx>, Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> > >>> > >>> On Fri, Jun 11, 2021 at 4:42 PM <tuexen@xxxxxxxxxxx> wrote: > >>>> > >>>>> On 11. Jun 2021, at 22:20, Xin Long <lucien.xin@xxxxxxxxx> wrote: > >>>>> > >>>>> Hi, Michael, > >>>>> > >>>>> In the linux implementation of RFC8899, we decided to introduce one > >>>>> sysctl and one socket option for users to set up the PLPMUTD probe: > >>>>> > >>>>> 1. sysctl -w net.sctp.plpmtud_probe_interval=1 > >>>>> > >>>>> plpmtud_probe_interval - INTEGER > >>>>> The interval (in milliseconds) between PLPMTUD probe chunks. These > >>>>> chunks are sent at the specified interval with a variable size to > >>>>> probe the mtu of a given path between 2 associations. PLPMTUD will > >>>> I guess you mean "between 2 end points" instead of "between 2 associations". > >>>> > >>>> I'm not sure what it means: > >>>> > >>>> I assume, you have candidate 1400, 1420, 1460, 1480, and 1500. > >>>> > >>>> Assume you sent a probe packet for 1400. Aren't you sending the > >>>> probe packet for 1420 as soon as you get an ACK for the probe packet > >>>> of size 1400? Or are you waiting for plpmtud_probe_interval ms? > >>> It will wait for "plpmtud_probe_interval" ms in searching state, but in > >>> searching complete it will be "plpmtud_probe_interval * 30" ms. > >> > >> Does this mean you always wait for plpmtud_probe_interval ms? Even if you receive an ack for a probe packet or a PTB? > >> > >> In my implementation, I start with the next probe immediately when receiving an ack or PTB. > > yeah, we should do it immediately to make this more efficient, and I > > already fixed it in linux for ACK. > > > > For PTB, I currently only set probe_size as the pmtu from ICMP packet > > when pmtu > 'current pmtu' && pmtu < probe_size, and wait until next > > probe_timer. But probably better to send it immediately too, I need to > > confirm. > > I think so. At least I don't know what to wait for. I'm not sure about this, as it says: PLPMTU < PL_PTB_SIZE < PROBED_SIZE ... * The PL can use the reported PL_PTB_SIZE from the PTB message as the next search point when it resumes the search algorithm. it doesn't seem to mean that. > > > > >> > >>> > >>> The step we are using is 32, when it fails, we turn the step to 4. For example: > >>> 1400, 1432, 1464, 1496, 1528 (failed), 1500(1496 + 4), 1504(failed, > >>> 1500 is the PMTU). > >> > >> What does failed mean? Does it mean that you have sent MAX_PROBES (=3?) probe packets and waited for each plpmtud_probe_interval ms without receiving a response? > > yes > > > >> > >> If so, it might make sense to continue with smaller candidates earlier. For example, after one unanswered probe packet. > > Sounds a good way to go, and it would save 2 intervals to get the > > optimal value in the normal case. > > But if the failure is false (like the link is unstable), it may also > > take some time to catch up to the bigger candidate. > > Right, it's a trade off. What is better depends on the probability of a probe packet loss due to another reason than its size. > > I chose to do something like this, when searching for a PMTU of 1472: > > 1400 ack > 1432 ack > 1464 timeout (false negative) > 1436 ack > 1440 ack > 1444 ack > 1448 ack > 1452 ack > 1456 ack > 1460 ack > 1464 ack > 1496 timeout > 1468 ack > 1472 ack > 1476 timeout > 1476 timeout > 1476 timeout > done with PMTU=1472 Looks good to me. :-) > > > > >> > >>> > >>> Sorry, "sysctl -w net.sctp.plpmtud_probe_interval=1" won't work. > >>> As plpmtud_probe_interval is the probe interval TIME for the timer. > >>> Apart from 0, the minimal value is 5000ms. > >>> > >>> So it should be: > >>> > >>> plpmtud_probe_interval - INTEGER > >>> The time interval (in milliseconds) for sending PLPMTUD probe chunks. > >>> These chunks are sent at the specified interval with a variable size > >>> to probe the mtu of a given path between 2 endpoints. PLPMTUD will > >>> be disabled when 0 is set. > >>> > >>> Default: 0 > >> > >> What do you mean with probe chunks? You are sending probe *packets* containing a HEARTBEAT and a PAD chunk, right? > > yes. > > > >> > >> RFC8899 contains: > >> The PROBE_TIMER is configured to expire after a period longer than the maximum time to receive an acknowledgment to a probe packet. > >> > >> So, how about plpmtud_probe_max_ack_time? > > "plpmtud_probe_interval" I got the name from tcp's sysctl plpmtud in > > linux. I was hoping to keep this consistent in sysctl and sockopt > > between Linux and BSD. Note this parameter is also the interval to > > send a probe for the current pmtu in Search Complete status. > > Do you send probe packets in Search Complete to confirm the current PMTU estimation? > > RFC8899 suggests to do this only for non-reliable PLs. For a reliable PL like SCTP, it suggests to use the loss of (data) packets as indication instead. Can you point out the place in RFC8899 saying so? What I saw is: Search Complete: The Search Complete Phase is entered when the PLPMTU is supported across the network path. A PL can use a CONFIRMATION_TIMER to periodically repeat a probe packet for the current PLPMTU size. If the sender is unable to confirm reachability (e.g., if the CONFIRMATION_TIMER expires) or the PL signals a lack of reachability, a black hole has been detected and DPLPMTUD enters the Base Phase. it desn't matter if it's a reliable or non-reliable PL, no? > > > > >> > >> Also, I think more parameters would be helpful. For example, > >> > >> plpmtud_enable - boolean to control whether to use PLPMTUD (it is more explicit than plpmtud_probe_interval=0 or plpmtud_probe_max_ack_time=0) > >> plpmtud_max_probes - controls the number of probe packets sent for one candidate. > >> plpmtud_raise_time - time to wait before probing for a larger PMTU in search complete (0 to disable it). > >> plpmtud_use_ptb - boolean to control whether to process an ICMP PTB. > > With these, the control will be more detailed for sure. > > But I didn't want to introduce too many parameters for this feature, > > as you know, these parameters could also be per socket/asoc/transport, > > and doing set/get with sockopt. > > > > instead, we keep most fixed: > > > > plpmtud_use_ptb = 1 > > plpmtud_raise_time = 30 * plpmtud_probe_max_ack_time(plpmtud_probe_interval) > > plpmtud_max_probes = 3 > > plpmtud_enable = !! plpmtud_probe_interval > > > > Only one variable: > > plpmtud_probe_interval >= 5000ms > > OK > > > > > So I think this is up to the implementation, if you want more things > > to tune, you can go ahead with these all parameters exposed to users. > > Agree. It is probably a good idea to add not too much parameters. > > > > >> > >> Timo > >> > >>> > >>> Thanks. > >>>>> be disabled when 0 is set. > >>>>> > >>>>> Default: 0 > >>>>> > >>>>> 2. a socket option that can be used per socket, assoc or transport > >>>>> > >>>>> /* PLPMTUD Probe Interval socket option */ > >>>>> struct sctp_probeinterval { > >>>>> sctp_assoc_t spi_assoc_id; > >>>>> struct sockaddr_storage spi_address; > >>>>> __u32 spi_interval; > >>>>> }; > >>>>> > >>>>> #define SCTP_PLPMTUD_PROBE_INTERVAL 133 > >>>>> > >>>>> > >>>>> The value above will enable/disable the PLPMUTD probe by setting up the probe > >>>>> interval for the timer. When it's 0, the timer will also stop and > >>>>> PLPMUTD is disabled. > >>>>> By this way, we don't need to introduce more options. > >>>> OK. > >>>>> > >>>>> We're expecting to keep consistent with BSD on this, pls check and > >>>>> share your thoughts. > >>>> Looks good to me. > >>>> > >>>> Best regards > >>>> Michael > >>>>> > >>>>> Thanks. > >>>> > >> > >> >