Re: [PATCH 4/4] sctp: heartbeats exceed maximum retransmssion limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wei Yongjun wrote:
> Vlad Yasevich wrote:
>> Wei Yongjun wrote:
>>   
>>> The number of HEARTBEAT chunks that an association may transmit is
>>> limited by Association.Max.Retrans count; however, the code allows
>>> us to send one extra heartbeat.
>>>
>>> This patch limits the number of heartbeats to the maximum count.
>>>
>>> Signed-off-by: Wei Yongjun <yjwei@xxxxxxxxxxxxxx>
>>> ---
>>>  net/sctp/sm_statefuns.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
>>> index 3e7f6d2..ac7bf6d 100644
>>> --- a/net/sctp/sm_statefuns.c
>>> +++ b/net/sctp/sm_statefuns.c
>>> @@ -962,7 +962,7 @@ sctp_disposition_t sctp_sf_sendbeat_8_3(const struct sctp_endpoint *ep,
>>>  {
>>>  	struct sctp_transport *transport = (struct sctp_transport *) arg;
>>>  
>>> -	if (asoc->overall_error_count > asoc->max_retrans) {
>>> +	if (asoc->overall_error_count >= asoc->max_retrans) {
>>>  		sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR,
>>>  				SCTP_ERROR(ETIMEDOUT));
>>>  		/* CMD_ASSOC_FAILED calls CMD_DELETE_TCB. */
>>>     
>> Hi Wei
>>
>> Here is the spec:
>>
>>    The endpoint should increment the respective error counter of the
>>    destination transport address each time a HEARTBEAT is sent to that
>>    address and not acknowledged within one RTO.
>>
>>    When the value of this counter reaches the protocol parameter
>>    'Path.Max.Retrans', the endpoint should mark the corresponding
>>    destination address as inactive...
>>
>> According to this, only unacknowledged HB count as errors.  The very
>> first HB we sent doest count as an error until the RTO expires.  So
>> the >= is the correct test here as we are really counting the number
>> of timeouts with reacknowledgment.
>>   
> 
> Hi vlad
> 
> There are two way to send HB. One is idle-like HB which is send after HB
> timer expires, the other is user initiated heartbeat.
> 
> Now the user initiated heartbeat is retranmited 10 times after send the
> first one, this is correctly. But the timer expires HB is sent 11 times,
> which means HB timeout 12 times.

Yes, that correct.  In the idle-link case, you have to discount the first
timeout and the first HB.

The way we implement error counting is we increment the error _every_ time
we send a HB, regardless of whether it's an idle-link detection, or user
triggered.  Once the HB is acknowledged, we reset the error count, but if
it times out, we send another HB and bump the error count.

Let's assume that the user set the Path.Max.Retrans to 1.  Let's see
what should happen in both cases:

user-triggered:
	1) send HB.
	2) error = 1
	3) start timer
	4) timeout
		4a) send HB
		4b) error = 2
		4c) start timer
	5) timeout
		5a) error out.

So we sent 1 HB, and 1 retransmission.

idle_link:
	1) timeout
		1a) send HB
		1b) error = 1
		1c) start timer
   	2) timeout
		2a) send HB
		2b) error = 2
		2c) start timer
	3) timeout
		3a) error out

So we sent 1 HB, and 1 retransmission.

In both cases we sent 2 HB chunks.  Essentially, we can not count the first HB
we sent toward the max.path.retrans limit.

If this is not what you are seeing, then we have problem.

-vlad

> 
> In the [PATCH 3/4], I changed to not inc the overall_error_count when
> send the first HB, so the user initiated heartbeat will retranmited 11
> times.
> 
> If you disagree with the [PATCH 3/4], this patch need update.
> 
> Regards.
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux