On Tue, Jul 14, 2020 at 07:12:58AM -0500, Corey Minyard wrote: > On Mon, Jul 13, 2020 at 07:11:04PM -0300, Marcelo Leitner wrote: > > On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote: > > > Hi, it's me again with another strange issue. In case you didn't figure > > > it out before, I'm working on a library that supports all different > > > types of stream I/O, and SCTP is one supported building block. I > > > noticed when I stacked a multiplexer layer on top of SCTP I started > > > getting timeouts occasionally. It took a bit, but I finally realized > > > that I was getting 200ms delays occasionally between sending a packet > > > and receiving a packet. I verified this with a trace right at the > > > sctp_send() and sctp_recvmsg() calls. It doesn't seem to be regular > > > in any way I can see, but it happens often enough to cause issues. > > > > > > If I replace the SCTP block with a TCP block, it works fine, and pretty > > > much all the code is the same except where it does the read and write > > > calls (including the epoll() usage, and I have also switched to select() > > > and it has the same issue). The write calls don't seem to be the issue, > > > I see two back-to-back writes a few microseconds apart and see a 200ms > > > delay between the messages on the receive side. > > > > > > The test in question sets up two connections and does a big simultaneous > > > bidirectional transfer. The test app has 4 threads waiting on epoll() > > > handling data and writing data. > > > > > > And the delay is always ~200ms. Which sounds suspicious. > > > > That can be the delayed sack timer, in kernel. > > /* Delayed sack timer - 200ms */ > > #define SCTP_DEFAULT_TIMEOUT_SACK (200) > > > > You may tweak the sysctl net.sctp.sack_timeout and see if changes > > accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack > > (by setting SPP_SACKDELAY_DISABLE) > > Ok, setting SPP_SACKDELAY_DISABLE does make the problem go away. > > > > > > > > > It's not using sctp_sendv() at the moment, as the systems I'm running on > > > don't have that yet. But the library does have support if it sees it is > > > available. > > > > > > So I don't think it's my library; I've stared at it a bunch (and found a > > > few other bugs) but I can't reconcile this one. There are no timers > > > > Nice. > > > > > that would cause this in the code in question. Just basically an > > > epoll() call waiting on data and receive processing that is comparing > > > data, along with write processing that is sending the same data. > > > > > > Anyway, I haven't tried to create a small reproducer; I thought I would > > > report it first and see if anything rang a bell. I tried this on a > > > recent kernel and got the same issue. > > > > I guess a combination of xmit rate, msg and buffer sizes and packet > > drops can lead to this delay. I've seen it happening, but didn't have > > the time to track it down back then. > > There should be no packet drops. It's running across localhost, and > flow control in the multiplex layer as it's set up for the tests limits > the maximum outstanding data to 1024 bytes. With overhead and flow > control messages it's maybe 1050 bytes of data that would ever be > unacked. (It's not really testing throughput, it's testing the inner > workings of the multiplexing protocol.) > > If I understand this correctly per the RFCs, by default if there are two > messages outstanding, it will send an sack immediately. Otherwise it > waits 200ms. I'm guessing what is happening is that that SCTP sends a > sack and then receives one more message and the upper layer stops > because of flow control. Then the sack comes out in 200ms and things > continue. Actually, that still doesn't make sense. The lack of a sack shouldn't keep anything from sending unless the congestion window is closed, which shouldn't happen in this case. Am I correct? -corey > > So I think I can figure out how to make this work smoothly. I assume > this doesn't happen in TCP because all packets carry an ack, and there > is data flowing both ways all the time. > > Thanks, > > -corey > > > > > That said, remember that Linux SCTP doesn't support buffer > > auto-tuning. So considering you're running a stress test, you probably > > want to adjust them accordingly manually to avoid packet drops. > > > > Marcelo > > > > > > > > The library is at https://github.com/cminyard/gensio. I'd need to > > > provide a patch for the tracing. > > > > > > -corey