Re: [PATCH net] net: sctp: fix multihoming retransmission path selection to rfc4960

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/20/2014 08:26 PM, Vlad Yasevich wrote:
On 02/20/2014 06:53 AM, Daniel Borkmann wrote:
Problem statement: 1) both paths (primary path1 and alternate
path2) are up after the association has been established i.e.,
HB packets are normally exchanged, 2) path2 gets inactive after
path_max_retrans * max_rto timed out (i.e. path2 is down completely),
3) now, if a transmission times out on the only surviving/active
path1 (any ~1sec network service impact could cause this like
a channel bonding failover), then the retransmitted packets are
sent over the inactive path2; this happens with partial failover
and without it.

Interesting.  The problem above is actually triggered by
sctp_retransmit().  When the T3-timeout occurs, we have
active_patch == retrans_path, and even though the timeout
occurred on the initial transmission of data, not a retransmit,
we end up updating retransmit path.

It might be worth adding adding this as a condition.  See below.

Besides not being optimal in the above scenario, a small failure
or timeout in the only existing path has the potential to cause
long delays in the retransmission (depending on RTO_MAX) until
the still active path is reselected.

RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
6.4.1. "Failover from an Inactive Destination Address" the
following:

   Some of the transport addresses of a multi-homed SCTP endpoint
   may become inactive due to either the occurrence of certain
   error conditions (see Section 8.2) or adjustments from the
   SCTP user.

   When there is outbound data to send and the primary path
   becomes inactive (e.g., due to failures), or where the SCTP
   user explicitly requests to send data to an inactive
   destination transport address, before reporting an error to
   its ULP, the SCTP endpoint should try to send the data to an
   alternate *active* destination transport address if one exists.

   When retransmitting data that timed out, if the endpoint is
   multihomed, it should consider each source-destination address
   pair in its retransmission selection policy. When retransmitting
   timed-out data, the endpoint should attempt to pick the most
   divergent source-destination pair from the original
   source-destination pair to which the packet was transmitted.

   Note: Rules for picking the most divergent source-destination
   pair are an implementation decision and are not specified
   within this document.

So, we should first reconsider to take the current active
retransmission transport if we cannot find an alternative
active one, as otherwise, by sending a user message to an
inactive destination transport address while excluding an
active destination transport address, we would not comply
to RFC4960. If all of that fails, we can still round robin
through unkown, partial failover, and inactive ones in the
hope to find something still suitable/useful.

Commit 4141ddc02a92 ("sctp: retran_path update bug fix") broke
that behaviour by selecting the next non-active transport when
no other active transport was found besides the current assoc's
peer.retran_path. Before commit 4141ddc02a92, we would have
traversed through the list until we reach our peer.retran_path
again, and in case that is still in state SCTP_ACTIVE, we would
take it and return. Only if that is not the case either, we
take the next inactive transport. Besides all that, another
issue is that transports in state SCTP_UNKNOWN could be preferred
over transports in state SCTP_ACTIVE in case a SCTP_ACTIVE
transport appears after SCTP_UNKNOWN in the transport list
yielding a "weaker" transport state to be used in retransmission.

This patch mostly reverts 4141ddc02a92, but also rewrites
this function to introduce more clarity and strictness into
the code. A strict priority of transport states is enforced
in this patch, hence selection is active > unkown > partial
failover > inactive.

Fixes: 4141ddc02a92 ("sctp: retran_path update bug fix")
Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx>
Cc: Gui Jianfeng <guijianfeng@xxxxxxxxxxxxxx>
...
-		t = list_entry(pos, struct sctp_transport, transports);
+	/* We're done as we only have the one and only path. */
+	if (asoc->peer.transport_count == 1)
+		return;


I think we should to do one more short circuit here:

         /* If active_path and retrans_path are the same and active,
          * then this is the only active path.  Use it.
          */
         if (asoc->peer.active_path == asoc->peer.retrans_path &&
             asoc->peer.active_path->state == SCTP_ACTIVE)
                 return;

Ok, will add and resubmit, thanks Vlad.

-		/* We have exhausted the list, but didn't find any
-		 * other active transports.  If so, use the next
-		 * transport.
-		 */
-		if (t == asoc->peer.retran_path) {
-			t = next;
+	/* Iterate from retran_path's successor back to retran_path. */
+	for (trans = list_next_entry(trans, transports); 1;
+	     trans = list_next_entry(trans, transports)) {
+		/* Manually skip the head element. */
+		if (&trans->transports == &asoc->peer.transport_addr_list)
+			continue;

Alternative way would be:
     head = &asoc->peer.transport_addr_list;
     list_for_each_enty_from(trans, head, transport) {
         ... do the work...

         /* Manually skip head element if it's next */
         if (list_next_entry(trans, transports) == head)
             trans = list_first_entry(head);
     }

It's up to you if you like this better or not.

I tested already the current version, so I'll go for that. ;)

Thanks for your feedback Vlad!
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux