On Mon, Nov 02, 2009 at 01:57:42PM -0500, Vlad Yasevich wrote: > > > Neil Horman wrote: > > On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote: > >> > >> Neil Horman wrote: > >>>> We could do that yes, but it concerns me, as assigning the tsn in > >>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might > >>>>> get dropped prior to submission to the ip layer. Consider if we have a routing > >>>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In > >>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our > >>>>> stream. Unless we have a recovery path for that, I think the better option is > >>>>> to wait to assign tsns until we are sure we can submit them to the ip layer > >>>>> safely (where the transmitted queue can re-tranmit them if need be). If you can > >>>>> explain the SACK case in a little more detail above, perhaps we can come up with > >>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit > >>>>> from sctp_packet_transmit_chunk for data chunks. > >>>> Assume that we have a number of queued chunks that add up to multiple MTUs > >>>> all going to the same transport (typical case). They are currently gated by > >>>> congestion window. > >>>> > >>>> A SACK arrives triggering a flush. With the proposed patch, once we fill a > >>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit > >>>> only a single packet and under-utilize our congesting window thus preventing > >>>> future growth. With the old code, we had multiple packets sent out thus > >>>> filling the congestion window. > >>>> > >>>> Another thing your patch didn't take into account is that every time we change > >>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it > >>>> empty. You would end up leaking chunks if there was any queuing effects. > >>>> > >>>> If a transient routing problem happens and the packet fails to get sent, that's > >>>> no different then a loss event in the network. It will get reported back as > >>>> gaps or, if the failure is long term, it will be detected with HBs and > >>>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is > >>>> added to the packet. We don't really want to do it any earlier though. > >>>> > >>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in > >>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we > >>> dequeue a data chunk from the outq and do the normal expiration and invalid > >>> stream checking. > >>> > >>> Regards > >>> Neil > >>> > >> Hi Neil > >> > >> I don't think we can do that in sctp_outq_flush(). > >> > > Why can't we do it in sctp_outq_flush? > > Ok, looking at the 'resent' code you left in packet_transmit, this will work, > but we now end up assigning sequence numbers to DATA that may not be transmitted > this time around. > > It will also make FWD-TSNs a bit weird. Worth a test. My personal preference > would be to do it when the chunk is added to the packet. > Ok, very well. I've moved the assignment to the point right after we actually enqueue the chunk to the offered packet. Fix use of sctp_packet_transmit so as to prevent packet re-ordering Recently had a bug reported to me, in which the user was sending packets with a payload containing a sequence number. The packets were getting delivered in order according the chunk TSN values, but the sequence values in the payload were arriving out of order. At first I thought it must be an application error, but we eventually found it to be a problem on the transmit side in the sctp stack. The conditions for the error are that multihoming must be in use, and it helps if each transport has a different pmtu. The problem occurs in sctp_outq_flush. Basically we dequeue packets from the data queue, and attempt to append them to the orrered packet for a given transport. After we append a data chunk we add the trasport to the end of a list of transports to have their packets sent at the end of sctp_outq_flush. The problem occurs when a data chunks fills up a offered packet on a transport. The function that does the appending (sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full packet, and then append the chunk to a new packet. This call to sctp_packet_transmit, sends that packet ahead of the others that may be queued in the transport_list in sctp_outq_flush. The result is that frames that were sent in one order from the user space sending application get re-ordered prior to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data payloads, even though tsn ordering is correct. The fix is to change where we assign a tsn. By doing this earlier, we are then free to place chunks in packets, whatever way we see fit and the protocol will make sure to do all the appropriate re-ordering on receive as is needed. Regards Neil Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Reported-by: William Reich <reich@xxxxxxxxxxx> diff --git a/net/sctp/output.c b/net/sctp/output.c index 7363935..51e3c5a 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -351,6 +351,10 @@ append: /* It is OK to send this chunk. */ list_add_tail(&chunk->list, &packet->chunk_list); + if (sctp_chunk_is_data(chunk)) { + sctp_chunk_assign_tsn(chunk); + sctp_chunk_assign_ssn(chunk); + } packet->size += chunk_len; chunk->transport = packet->transport; finish: @@ -454,23 +458,22 @@ int sctp_packet_transmit(struct sctp_packet *packet) list_del_init(&chunk->list); if (sctp_chunk_is_data(chunk)) { - if (!chunk->has_tsn) { - sctp_chunk_assign_ssn(chunk); - sctp_chunk_assign_tsn(chunk); + if (!chunk->resent) { - /* 6.3.1 C4) When data is in flight and when allowed - * by rule C5, a new RTT measurement MUST be made each - * round trip. Furthermore, new RTT measurements - * SHOULD be made no more than once per round-trip - * for a given destination transport address. - */ + /* 6.3.1 C4) When data is in flight and when allowed + * by rule C5, a new RTT measurement MUST be made each + * round trip. Furthermore, new RTT measurements + * SHOULD be made no more than once per round-trip + * for a given destination transport address. + */ if (!tp->rto_pending) { chunk->rtt_in_progress = 1; tp->rto_pending = 1; } - } else - chunk->resent = 1; + } + + chunk->resent = 1; has_data = 1; } -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html