Re: [PATCH] sctp: Fix mis-ordering of user space data when multihoming in use

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 02, 2009 at 01:57:42PM -0500, Vlad Yasevich wrote:
> 
> 
> Neil Horman wrote:
> > On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
> >>
> >> Neil Horman wrote:
> >>>> We could do that yes, but it concerns me, as assigning the tsn in
> >>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> >>>>> get dropped prior to submission to the ip layer.  Consider if we have a routing
> >>>>> table disruption, and we  follow the no_route path in sctp_packet_transmit.  In
> >>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
> >>>>> stream.  Unless we have a recovery path for that, I think the better option is
> >>>>> to wait to assign tsns until we are sure we can submit them to the ip layer
> >>>>> safely (where the transmitted queue can re-tranmit them if need be).  If you can
> >>>>> explain the SACK case in a little more detail above, perhaps we can come up with
> >>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
> >>>>> from sctp_packet_transmit_chunk for data chunks.
> >>>> Assume that we have a number of queued chunks that add up to multiple MTUs
> >>>> all going to the same transport (typical case).  They are currently gated by
> >>>> congestion window.
> >>>>
> >>>> A SACK arrives triggering a flush.  With the proposed patch, once we fill a
> >>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
> >>>> only a single packet and under-utilize our congesting window thus preventing
> >>>> future growth.  With the old code, we had multiple packets sent out thus
> >>>> filling the congestion window.
> >>>>
> >>>> Another thing your patch didn't take into account is that every time we change
> >>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
> >>>> empty.  You would end up leaking chunks if there was any queuing effects.
> >>>>
> >>>> If a transient routing problem happens and the packet fails to get sent, that's
> >>>> no different then a loss event in the network.  It will get reported back as
> >>>> gaps or, if the failure is long term, it will be detected with HBs and
> >>>> retransmissions.  So I don't see a problem of assigning TSNs when the DATA is
> >>>> added to the packet.  We don't really want to do it any earlier though.
> >>>>
> >>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
> >>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
> >>> dequeue a data chunk from the outq and do the normal expiration and invalid
> >>> stream checking.
> >>>
> >>> Regards
> >>> Neil
> >>>
> >> Hi Neil
> >>
> >> I don't think we can do that in sctp_outq_flush().
> >>
> > Why can't we do it in sctp_outq_flush?
> 
> Ok, looking at the 'resent' code you left in packet_transmit, this will work,
> but we now end up assigning sequence numbers to DATA that may not be transmitted
> this time around.
> 
> It will also make FWD-TSNs a bit weird.  Worth a test.  My personal preference
> would be to do it when the chunk is added to the packet.
> 

Ok, very well.  I've moved the assignment to the point right after we actually
enqueue the chunk to the offered packet.


Fix use of sctp_packet_transmit so as to prevent packet re-ordering

Recently had a bug reported to me, in which the user was sending packets with a
payload containing a sequence number.  The packets were getting delivered in
order according the chunk TSN values, but the sequence values in the payload
were arriving out of order.  At first I thought it must be an application error,
but we eventually found it to be a problem on the transmit side in the sctp
stack.

The conditions for the error are that multihoming must be in use, and it helps
if each transport has a different pmtu.  The problem occurs in sctp_outq_flush.
Basically we dequeue packets from the data queue, and attempt to append them to
the orrered packet for a given transport.  After we append a data chunk we add
the trasport to the end of a list of transports to have their packets sent at
the end of sctp_outq_flush.  The problem occurs when a data chunks fills up a
offered packet on a transport.  The function that does the appending
(sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
packet, and then append the chunk to a new packet.  This call to
sctp_packet_transmit, sends that packet ahead of the others that may be queued
in the transport_list in sctp_outq_flush.  The result is that frames that were
sent in one order from the user space sending application get re-ordered prior
to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
payloads, even though tsn ordering is correct.

The fix is to change where we assign a tsn.  By doing this earlier,
we are then free to place chunks in packets, whatever way we
see fit and the protocol will make sure to do all the appropriate re-ordering on
receive as is needed.

Regards
Neil

Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
Reported-by: William Reich <reich@xxxxxxxxxxx>

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 7363935..51e3c5a 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -351,6 +351,10 @@ append:
 
 	/* It is OK to send this chunk.  */
 	list_add_tail(&chunk->list, &packet->chunk_list);
+	if (sctp_chunk_is_data(chunk)) {
+		sctp_chunk_assign_tsn(chunk);
+		sctp_chunk_assign_ssn(chunk);
+	}
 	packet->size += chunk_len;
 	chunk->transport = packet->transport;
 finish:
@@ -454,23 +458,22 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 		list_del_init(&chunk->list);
 		if (sctp_chunk_is_data(chunk)) {
 
-			if (!chunk->has_tsn) {
-				sctp_chunk_assign_ssn(chunk);
-				sctp_chunk_assign_tsn(chunk);
+			if (!chunk->resent) {
 
-			/* 6.3.1 C4) When data is in flight and when allowed
-			 * by rule C5, a new RTT measurement MUST be made each
-			 * round trip.  Furthermore, new RTT measurements
-			 * SHOULD be made no more than once per round-trip
-			 * for a given destination transport address.
-			 */
+				/* 6.3.1 C4) When data is in flight and when allowed
+				 * by rule C5, a new RTT measurement MUST be made each
+				 * round trip.  Furthermore, new RTT measurements
+				 * SHOULD be made no more than once per round-trip
+				 * for a given destination transport address.
+				 */
 
 				if (!tp->rto_pending) {
 					chunk->rtt_in_progress = 1;
 					tp->rto_pending = 1;
 				}
-			} else
-				chunk->resent = 1;
+			}
+
+			chunk->resent = 1;
 
 			has_data = 1;
 		}
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux