Re: [PATCH] Fix piggybacked ACKs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Doug Graham wrote:
> Oops.  Sent the last one in HTML,  so the mailing list rejected it. 
> Damned GUI email
> clients!
>
> Wei Yongjun wrote:
>> Doug Graham wrote:
>>  
>>> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>>>      
>>>> Doug Graham wrote:
>>>>          
>>>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data)  14
>>>>> 2.203092    10.0.0.11   10.0.0.15   SACK  15 2.203153   
>>>>> 10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK  17 2.203808   
>>>>> 10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK  19 2.403686   
>>>>> 10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK
>>>>> What bothers me about this is that Nagle seems to be introducing a
>>>>> delay
>>>>> here.  The first DATA packets in both directions are MTU-sized
>>>>> packets,
>>>>> yet both the Linux client and the BSD server wait 200ms until they
>>>>> get
>>>>> the SACK to the first fragment before sending the second fragment.
>>>>> The server can't send its reply until it gets both fragments, and the
>>>>> client can't reassemble the reply until it gets both fragments, so
>>>>> from
>>>>> the application's point of view, the reply doesn't arrive until 400ms
>>>>> after the request is sent.  This could probably be fixed by disabling
>>>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is
>>>>> only
>>>>> supposed to prevent multiple outstanding *small* packets.
>>>>>                 
>>>> I think you hit the point which Nagle's algorithm should be not used.
>>>>
>>>> Can you try the following patch?
>>>>
>>>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is
>>>> transmitted
>>>>
>>>> If fragmented data is sent, the Nagle's algorithm should not be
>>>> used. In special case, if only one large packet is sent, the delay
>>>> send of fragmented data will cause the receiver wait for more
>>>> fragmented data to reassembe them and not send SACK, but the sender
>>>> still wait for SACK before send the last fragment.
>>>>           
>>> [patch deleted]
>>>
>>> This patch seems to work quite well, but I think disabling Nagle
>>> completely for large messages is not quite the right thing to do.
>>> There's a draft-minshall-nagle-01.txt floating around that describes a
>>> modified Nagle algorithm for TCP.  It appears to have been implemented
>>> in Linux TCP even though the draft has expired.  The modified algorithm
>>> is how I thought Nagle had always worked to begin with.  From the
>>> draft:
>>>
>>>         "If a TCP has less than a full-sized packet to transmit,
>>>         and if any previously transmitted less than full-sized
>>>         packet has not yet been acknowledged, do not transmit
>>>         a packet."
>>>
>>> so in the case of sending a fragmented SCTP message, all but the last
>>> fragment will be full-sized and will be sent without delay.  The last
>>> fragment will usually not be full-sized, but it too will be sent
>>> without
>>> delay because there are no outstanding non-full-sized packets.
>>>
>>> The difference between this and your method is that yours would
>>> allow many small fragments of big messages to be outstanding, whereas
>>> this one would only allow the first big message to be sent in its
>>> entirety, followed by the full-sized fragments of the next big
>>> message.  When it came time to send the second small fragment,
>>> Nagle would force it to wait for an ACK for the first small fragment.
>>> I'm not convinced that the difference is all that important,
>>> but who knows.
>>>
>>> Here's my attempt at implementing the modified Nagle algorithm
>>> described
>>> in draft-minshall-nagle-01.txt.  It should be applied instead of your
>>> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
>>> is zero, no delay is introduced.  The assumption is that this means
>>> that
>>> all outstanding packets (if any) are full-sized.
>>>
>>> Signed-off-by: Doug Graham <dgraham@xxxxxxxxxx>
>>>
>>> ---
>>> --- linux-2.6.29/net/sctp/output.c    2009/08/02 00:47:44    1.3
>>> +++ linux-2.6.29/net/sctp/output.c    2009/08/02 00:51:18
>>> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>>>       * unacknowledged.
>>>       */
>>>      if (!sp->nodelay && sctp_packet_empty(packet) &&
>>> -        q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
>>> +        (q->outstanding_bytes % asoc->frag_point) != 0 &&
>>> +        sctp_state(asoc, ESTABLISHED)) {
>>>          unsigned len = datasize + q->out_qlen;
>>>  
>>>          /* Check whether this chunk and all the rest of pending
>>>       
>>
>>
>> Seem good! But it may be broken the small packet transmit which can be
>> used Nagle algorithm.
>> Such as this:
>>
>> Endpoint A                Endpint B
>>           <-------------  DATA (size=1452/2) delay send
>>           <-------------  DATA (size=1452/2) send immediately
>>           <-------------  DATA (size=1452/2) send immediately ** broken
>>           <-------------  DATA (size=1452/2) delay send
>>           <-------------  DATA (size=1452/2) send immediately
>>           <-------------  DATA (size=1452/2) send immediately ** broken
>>
>>
>> Can you try this one?
>>
>>
>>   
>
> I would, except I don't understand what you're getting at.  Does this
> mean to send a total of
> 6 1454 byte messages from B to A?  If so, why would the first one be
> delayed?

Oh, no, six 726 bytes(1452/2) messages, may be the 1st and 2nd are
bundled in one packet,
the 3rd is a single packet, the 4th, 5th are bundled, the 6th is single.
I have no test it.

>
> Assuming that no SACKs are received by B, this should result in the
> first 3 packets getting sent
> immediately, a 1452 byte fragment, then a 2 byte fragment, then the
> second 1452 byte fragment.
> When it comes time to send the second 2 byte fragment, Nagle kicks in
> and prevents if from
> being sent until a SACK is received.
>
> But I'm pretty sure I missed your point.  Can you flesh it out a bit?
>
> --Doug
>>
>>   
>
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux