Vlad Yasevich wrote: > Wei Yongjun wrote: > >> Vlad Yasevich wrote: >> >>> Wei Yongjun wrote: >>> >>> >>>> The number of HEARTBEAT chunks that an association may transmit is >>>> limited by Association.Max.Retrans count; however, the code allows >>>> us to send one extra heartbeat. >>>> >>>> This patch limits the number of heartbeats to the maximum count. >>>> >>>> Signed-off-by: Wei Yongjun <yjwei@xxxxxxxxxxxxxx> >>>> --- >>>> net/sctp/sm_statefuns.c | 2 +- >>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>> >>>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c >>>> index 3e7f6d2..ac7bf6d 100644 >>>> --- a/net/sctp/sm_statefuns.c >>>> +++ b/net/sctp/sm_statefuns.c >>>> @@ -962,7 +962,7 @@ sctp_disposition_t sctp_sf_sendbeat_8_3(const struct sctp_endpoint *ep, >>>> { >>>> struct sctp_transport *transport = (struct sctp_transport *) arg; >>>> >>>> - if (asoc->overall_error_count > asoc->max_retrans) { >>>> + if (asoc->overall_error_count >= asoc->max_retrans) { >>>> sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR, >>>> SCTP_ERROR(ETIMEDOUT)); >>>> /* CMD_ASSOC_FAILED calls CMD_DELETE_TCB. */ >>>> >>>> >>> Hi Wei >>> >>> Here is the spec: >>> >>> The endpoint should increment the respective error counter of the >>> destination transport address each time a HEARTBEAT is sent to that >>> address and not acknowledged within one RTO. >>> >>> When the value of this counter reaches the protocol parameter >>> 'Path.Max.Retrans', the endpoint should mark the corresponding >>> destination address as inactive... >>> >>> According to this, only unacknowledged HB count as errors. The very >>> first HB we sent doest count as an error until the RTO expires. So >>> the >= is the correct test here as we are really counting the number >>> of timeouts with reacknowledgment. >>> >>> >> Hi vlad >> >> There are two way to send HB. One is idle-like HB which is send after HB >> timer expires, the other is user initiated heartbeat. >> >> Now the user initiated heartbeat is retranmited 10 times after send the >> first one, this is correctly. But the timer expires HB is sent 11 times, >> which means HB timeout 12 times. >> > > Yes, that correct. In the idle-link case, you have to discount the first > timeout and the first HB. > Oh, maybe I mistaked for retranmit and unacknowledged HEARTBEAT. The spec talk about the HEARTBEAT as unacknowledged HEARTBEAT. Such as: 8.1. Endpoint Failure Detection An endpoint shall keep a counter on the total number of consecutive retransmissions to its peer (this includes retransmissions to all the destination transport addresses of the peer if it is multi-homed), *including unacknowledged HEARTBEAT chunks*. 8.2. Path Failure Detection Each time the T3-rtx timer expires on any address, or when a *HEARTBEAT sent to an idle address is not acknowledged* within an RTO, the error counter of that destination address will be incremented. So in my head the idle-link case is treat as unacknowledged HEARTBEAT, and user initiated heartbeat is treat as retranmit. The idle_link case is 11 unacknowledged HEARTBEAT, and 10 retranmit, and also user-triggered case. If overall_error_count count the retranmit, this patch is not need. Regards > The way we implement error counting is we increment the error _every_ time > we send a HB, regardless of whether it's an idle-link detection, or user > triggered. Once the HB is acknowledged, we reset the error count, but if > it times out, we send another HB and bump the error count. > > Let's assume that the user set the Path.Max.Retrans to 1. Let's see > what should happen in both cases: > > user-triggered: > 1) send HB. > 2) error = 1 > 3) start timer > 4) timeout > 4a) send HB > 4b) error = 2 > 4c) start timer > 5) timeout > 5a) error out. > > So we sent 1 HB, and 1 retransmission. > > idle_link: > 1) timeout > 1a) send HB > 1b) error = 1 > 1c) start timer > 2) timeout > 2a) send HB > 2b) error = 2 > 2c) start timer > 3) timeout > 3a) error out > > So we sent 1 HB, and 1 retransmission. > > In both cases we sent 2 HB chunks. Essentially, we can not count the first HB > we sent toward the max.path.retrans limit. > > If this is not what you are seeing, then we have problem. > -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html