Re: The new sysctl and socket option added for PLPMTUD (RFC8899)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 7. Jul 2021, at 18:30, Xin Long <lucien.xin@xxxxxxxxx> wrote:
> 
> On Wed, Jul 7, 2021 at 8:36 AM Timo Völker <timo.voelker@xxxxxxxxxxxxxx> wrote:
>> 
>>> On 6. Jul 2021, at 18:01, Xin Long <lucien.xin@xxxxxxxxx> wrote:
>>> 
>>> On Tue, Jul 6, 2021 at 5:13 AM Timo Völker <timo.voelker@xxxxxxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> Hi Xin,
>>>> 
>>>> I implemented RFC8899 for an SCTP simulation model.
>>> great, can I know what that one is?
>> 
>> I used the SCTP implementation in INET. INET is a simulation model suite for OMNeT++.
> Thanks.
> 
>> 
>>> 
>>>> 
>>>> Comments follow inline.
>>>> 
>>>>> Begin forwarded message:
>>>>> 
>>>>> From: Xin Long <lucien.xin@xxxxxxxxx>
>>>>> Subject: Re: The new sysctl and socket option added for PLPMTUD (RFC8899)
>>>>> Date: 12. June 2021 at 19:32:02 CEST
>>>>> To: Michael Tuexen <tuexen@xxxxxxxxxxx>
>>>>> Cc: "linux-sctp @ vger . kernel . org" <linux-sctp@xxxxxxxxxxxxxxx>, Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx>
>>>>> 
>>>>> On Fri, Jun 11, 2021 at 4:42 PM <tuexen@xxxxxxxxxxx> wrote:
>>>>>> 
>>>>>>> On 11. Jun 2021, at 22:20, Xin Long <lucien.xin@xxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> Hi, Michael,
>>>>>>> 
>>>>>>> In the linux implementation of RFC8899, we decided to introduce one
>>>>>>> sysctl and one socket option for users to set up the PLPMUTD probe:
>>>>>>> 
>>>>>>> 1. sysctl -w net.sctp.plpmtud_probe_interval=1
>>>>>>> 
>>>>>>> plpmtud_probe_interval - INTEGER
>>>>>>>     The interval (in milliseconds) between PLPMTUD probe chunks. These
>>>>>>>     chunks are sent at the specified interval with a variable size to
>>>>>>>     probe the mtu of a given path between 2 associations. PLPMTUD will
>>>>>> I guess you mean "between 2 end points" instead of "between 2 associations".
>>>>>> 
>>>>>> I'm not sure what it means:
>>>>>> 
>>>>>> I assume, you have candidate 1400, 1420, 1460, 1480, and 1500.
>>>>>> 
>>>>>> Assume you sent a probe packet for 1400. Aren't you sending the
>>>>>> probe packet for 1420 as soon as you get an ACK for the probe packet
>>>>>> of size 1400? Or are you waiting for plpmtud_probe_interval ms?
>>>>> It will wait for "plpmtud_probe_interval" ms in searching state, but in
>>>>> searching complete it will be "plpmtud_probe_interval * 30" ms.
>>>> 
>>>> Does this mean you always wait for plpmtud_probe_interval ms? Even if you receive an ack for a probe packet or a PTB?
>>>> 
>>>> In my implementation, I start with the next probe immediately when receiving an ack or PTB.
>>> yeah, we should do it immediately to make this more efficient, and I
>>> already fixed it in linux for ACK.
>>> 
>>> For PTB, I currently only set probe_size as the pmtu from ICMP packet
>>> when pmtu > 'current pmtu' && pmtu < probe_size, and wait until next
>>> probe_timer. But probably better to send it immediately too, I need to
>>> confirm.
>> 
>> I think so. At least I don't know what to wait for.
> I'm not sure about this, as it says:
> 
>   PLPMTU < PL_PTB_SIZE < PROBED_SIZE
>   ...
>      *  The PL can use the reported PL_PTB_SIZE from the PTB message as
>         the next search point when it resumes the search algorithm.
> 
> it doesn't seem to mean that.

The "when it resumes the search algorithm" is a litte abstract, but I don't understand it as the PL has to wait for a timeout before resuming the search algorithm.

> 
> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> The step we are using is 32, when it fails, we turn the step to 4. For example:
>>>>> 1400, 1432, 1464, 1496, 1528 (failed), 1500(1496 + 4), 1504(failed,
>>>>> 1500 is the PMTU).
>>>> 
>>>> What does failed mean? Does it mean that you have sent MAX_PROBES (=3?) probe packets and waited for each plpmtud_probe_interval ms without receiving a response?
>>> yes
>>> 
>>>> 
>>>> If so, it might make sense to continue with smaller candidates earlier. For example, after one unanswered probe packet.
>>> Sounds a good way to go, and it would save 2 intervals to get the
>>> optimal value in the normal case.
>>> But if the failure is false (like the link is unstable), it may also
>>> take some time to catch up to the bigger candidate.
>> 
>> Right, it's a trade off. What is better depends on the probability of a probe packet loss due to another reason than its size.
>> 
>> I chose to do something like this, when searching for a PMTU of 1472:
>> 
>> 1400 ack
>> 1432 ack
>> 1464 timeout (false negative)
>> 1436 ack
>> 1440 ack
>> 1444 ack
>> 1448 ack
>> 1452 ack
>> 1456 ack
>> 1460 ack
>> 1464 ack
>> 1496 timeout
>> 1468 ack
>> 1472 ack
>> 1476 timeout
>> 1476 timeout
>> 1476 timeout
>> done with PMTU=1472
> Looks good to me. :-)
> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> Sorry, "sysctl -w net.sctp.plpmtud_probe_interval=1" won't work.
>>>>> As plpmtud_probe_interval is the probe interval TIME for the timer.
>>>>> Apart from 0, the minimal value is 5000ms.
>>>>> 
>>>>> So it should be:
>>>>> 
>>>>> plpmtud_probe_interval - INTEGER
>>>>>      The time interval (in milliseconds) for sending PLPMTUD probe chunks.
>>>>>      These chunks are sent at the specified interval with a variable size
>>>>>      to probe the mtu of a given path between 2 endpoints. PLPMTUD will
>>>>>      be disabled when 0 is set.
>>>>> 
>>>>>      Default: 0
>>>> 
>>>> What do you mean with probe chunks? You are sending probe *packets* containing a HEARTBEAT and a PAD chunk, right?
>>> yes.
>>> 
>>>> 
>>>> RFC8899 contains:
>>>> The PROBE_TIMER is configured to expire after a period longer than the maximum time to receive an acknowledgment to a probe packet.
>>>> 
>>>> So, how about plpmtud_probe_max_ack_time?
>>> "plpmtud_probe_interval" I got the name from tcp's sysctl plpmtud in
>>> linux. I was hoping to keep this consistent in sysctl and sockopt
>>> between Linux and BSD.  Note this parameter is also the interval to
>>> send a probe for the current pmtu in Search Complete status.
>> 
>> Do you send probe packets in Search Complete to confirm the current PMTU estimation?
>> 
>> RFC8899 suggests to do this only for non-reliable PLs. For a reliable PL like SCTP, it suggests to use the loss of (data) packets as indication instead.
> Can you point out the place in RFC8899 saying so?
> 
> What I saw is:
> 
>   Search Complete:  The Search Complete Phase is entered when the
>      PLPMTU is supported across the network path.  A PL can use a
>      CONFIRMATION_TIMER to periodically repeat a probe packet for the
>      current PLPMTU size.  If the sender is unable to confirm
>      reachability (e.g., if the CONFIRMATION_TIMER expires) or the PL
>      signals a lack of reachability, a black hole has been detected and
>      DPLPMTUD enters the Base Phase.
> 
> it desn't matter if it's a reliable or non-reliable PL, no?

The description of the phases are used to give a high level overview about the mechanism. The state diagram is more detailed. There you find this sentence: "When used with an acknowledged PL (e.g., SCTP), DPLPMTUD SHOULD NOT continue to generate PLPMTU probes in this state". However, it refers only to probes for confirmation of the current PMTU estimation. SCTP should send probe packets to probe for a larger PMTU in Search Complete.

> 
>> 
>>> 
>>>> 
>>>> Also, I think more parameters would be helpful. For example,
>>>> 
>>>> plpmtud_enable - boolean to control whether to use PLPMTUD (it is more explicit than plpmtud_probe_interval=0 or plpmtud_probe_max_ack_time=0)
>>>> plpmtud_max_probes - controls the number of probe packets sent for one candidate.
>>>> plpmtud_raise_time - time to wait before probing for a larger PMTU in search complete (0 to disable it).
>>>> plpmtud_use_ptb - boolean to control whether to process an ICMP PTB.
>>> With these, the control will be more detailed for sure.
>>> But I didn't want to introduce too many parameters for this feature,
>>> as you know, these parameters could also be per socket/asoc/transport,
>>> and doing set/get with sockopt.
>>> 
>>> instead, we keep most fixed:
>>> 
>>> plpmtud_use_ptb = 1
>>> plpmtud_raise_time = 30 * plpmtud_probe_max_ack_time(plpmtud_probe_interval)
>>> plpmtud_max_probes = 3
>>> plpmtud_enable = !! plpmtud_probe_interval
>>> 
>>> Only one variable:
>>> plpmtud_probe_interval >= 5000ms
>> 
>> OK
>> 
>>> 
>>> So I think this is up to the implementation, if you want more things
>>> to tune, you can go ahead with these all parameters exposed to users.
>> 
>> Agree. It is probably a good idea to add not too much parameters.
>> 
>>> 
>>>> 
>>>> Timo
>>>> 
>>>>> 
>>>>> Thanks.
>>>>>>>     be disabled when 0 is set.
>>>>>>> 
>>>>>>>     Default: 0
>>>>>>> 
>>>>>>> 2. a socket option that can be used per socket, assoc or transport
>>>>>>> 
>>>>>>> /* PLPMTUD Probe Interval socket option */
>>>>>>> struct sctp_probeinterval {
>>>>>>>     sctp_assoc_t spi_assoc_id;
>>>>>>>     struct sockaddr_storage spi_address;
>>>>>>>     __u32 spi_interval;
>>>>>>> };
>>>>>>> 
>>>>>>> #define SCTP_PLPMTUD_PROBE_INTERVAL    133
>>>>>>> 
>>>>>>> 
>>>>>>> The value above will enable/disable the PLPMUTD probe by setting up the probe
>>>>>>> interval for the timer. When it's 0, the timer will also stop and
>>>>>>> PLPMUTD is disabled.
>>>>>>> By this way, we don't need to introduce more options.
>>>>>> OK.
>>>>>>> 
>>>>>>> We're expecting to keep consistent with BSD on this, pls check and
>>>>>>> share your thoughts.
>>>>>> Looks good to me.
>>>>>> 
>>>>>> Best regards
>>>>>> Michael
>>>>>>> 
>>>>>>> Thanks.
>>>>>> 
>>>> 
>>>> 
>> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     SCTP

  Powered by Linux