> On 14. Jul 2020, at 18:23, Corey Minyard <minyard@xxxxxxx> wrote: > > On Tue, Jul 14, 2020 at 03:51:49PM +0200, Michael Tuexen wrote: >>> On 14. Jul 2020, at 15:10, Corey Minyard <minyard@xxxxxxx> wrote: >>> >>> On Tue, Jul 14, 2020 at 07:12:58AM -0500, Corey Minyard wrote: >>>> On Mon, Jul 13, 2020 at 07:11:04PM -0300, Marcelo Leitner wrote: >>>>> On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote: >>>>>> Hi, it's me again with another strange issue. In case you didn't figure >>>>>> it out before, I'm working on a library that supports all different >>>>>> types of stream I/O, and SCTP is one supported building block. I >>>>>> noticed when I stacked a multiplexer layer on top of SCTP I started >>>>>> getting timeouts occasionally. It took a bit, but I finally realized >>>>>> that I was getting 200ms delays occasionally between sending a packet >>>>>> and receiving a packet. I verified this with a trace right at the >>>>>> sctp_send() and sctp_recvmsg() calls. It doesn't seem to be regular >>>>>> in any way I can see, but it happens often enough to cause issues. >>>>>> >>>>>> If I replace the SCTP block with a TCP block, it works fine, and pretty >>>>>> much all the code is the same except where it does the read and write >>>>>> calls (including the epoll() usage, and I have also switched to select() >>>>>> and it has the same issue). The write calls don't seem to be the issue, >>>>>> I see two back-to-back writes a few microseconds apart and see a 200ms >>>>>> delay between the messages on the receive side. >>>>>> >>>>>> The test in question sets up two connections and does a big simultaneous >>>>>> bidirectional transfer. The test app has 4 threads waiting on epoll() >>>>>> handling data and writing data. >>>>>> >>>>>> And the delay is always ~200ms. Which sounds suspicious. >>>>> >>>>> That can be the delayed sack timer, in kernel. >>>>> /* Delayed sack timer - 200ms */ >>>>> #define SCTP_DEFAULT_TIMEOUT_SACK (200) >>>>> >>>>> You may tweak the sysctl net.sctp.sack_timeout and see if changes >>>>> accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack >>>>> (by setting SPP_SACKDELAY_DISABLE) >>>> >>>> Ok, setting SPP_SACKDELAY_DISABLE does make the problem go away. >>>> >>>>> >>>>>> >>>>>> It's not using sctp_sendv() at the moment, as the systems I'm running on >>>>>> don't have that yet. But the library does have support if it sees it is >>>>>> available. >>>>>> >>>>>> So I don't think it's my library; I've stared at it a bunch (and found a >>>>>> few other bugs) but I can't reconcile this one. There are no timers >>>>> >>>>> Nice. >>>>> >>>>>> that would cause this in the code in question. Just basically an >>>>>> epoll() call waiting on data and receive processing that is comparing >>>>>> data, along with write processing that is sending the same data. >>>>>> >>>>>> Anyway, I haven't tried to create a small reproducer; I thought I would >>>>>> report it first and see if anything rang a bell. I tried this on a >>>>>> recent kernel and got the same issue. >>>>> >>>>> I guess a combination of xmit rate, msg and buffer sizes and packet >>>>> drops can lead to this delay. I've seen it happening, but didn't have >>>>> the time to track it down back then. >>>> >>>> There should be no packet drops. It's running across localhost, and >>>> flow control in the multiplex layer as it's set up for the tests limits >>>> the maximum outstanding data to 1024 bytes. With overhead and flow >>>> control messages it's maybe 1050 bytes of data that would ever be >>>> unacked. (It's not really testing throughput, it's testing the inner >>>> workings of the multiplexing protocol.) >>>> >>>> If I understand this correctly per the RFCs, by default if there are two >>>> messages outstanding, it will send an sack immediately. Otherwise it >>>> waits 200ms. I'm guessing what is happening is that that SCTP sends a >>>> sack and then receives one more message and the upper layer stops >>>> because of flow control. Then the sack comes out in 200ms and things >>>> continue. >>> >>> Actually, that still doesn't make sense. The lack of a sack shouldn't >>> keep anything from sending unless the congestion window is closed, which >>> shouldn't happen in this case. Am I correct? >> I guess you still have the Nagle algorithm enabled. Try enabled the SCTP_NODELAY >> socket option: https://tools.ietf.org/html/rfc6458#section-8.1.5 at the sender side. >> >> It is enabled by default and will delay the sending of packet if they are >> not large enough (an implementation decision) and there is outstanding data. > > Well, that was a surprise, disabling Nagle caused the problem to go > away. Nagle generally doesn't make a difference when transferring lots > of data. Correct. > > I guess this is a bad interaction between Nagle and the SCTP > sack algorithms. With TCP in my test, data is flowing both ways so data > is always being acked, and Nagle is never significantly involved. TCP also uses delayed ACKs. However, bidirectional transfers are different from unidirectional ones. > > That's happening with SCTP, too, but in some situations a sack could be > sent, you get one more packet sent, and that packet won't be acked until > another packet is sent. So you have unacked data, and Nagle will hold > any new data until it receives an ack for the outstanding packet. So you > get stuck until the sack delay elapses. Bah. The same applies to TCP... > > This is sort of like the interaction between Nagle and TCP delayed ack. > Which is sort of a bug, I guess, but well known. I have a number of > ways to work around this issue, and I can document it so users can know. > > Thanks for your help. You are welcome. Best regards Michael > > -corey > >> >> Best regards >> Michael >>> >>> -corey >>> >>>> >>>> So I think I can figure out how to make this work smoothly. I assume >>>> this doesn't happen in TCP because all packets carry an ack, and there >>>> is data flowing both ways all the time. >>>> >>>> Thanks, >>>> >>>> -corey >>>> >>>>> >>>>> That said, remember that Linux SCTP doesn't support buffer >>>>> auto-tuning. So considering you're running a stress test, you probably >>>>> want to adjust them accordingly manually to avoid packet drops. >>>>> >>>>> Marcelo >>>>> >>>>>> >>>>>> The library is at https://github.com/cminyard/gensio. I'd need to >>>>>> provide a patch for the tracing. >>>>>> >>>>>> -corey >>