From: Xin Long > Sent: 10 October 2020 03:35 > On Fri, Oct 9, 2020 at 9:03 PM David Laight <David.Laight@xxxxxxxxxx> wrote: > > > > From: David Laight > > > Sent: 09 October 2020 12:14 > > > > > > From: Andreas Fink > > > > Sent: 09 October 2020 08:25 > > > > > > > > Can you see this issue with the 5.4 kernel too? > > > > > > > > I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of > links > > > > going off after a while so I had to revert back. > > > > 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you > > > > > > I've just spent hours digging through my traces. > > > It is only some messages through the connection that get lost! > > > > > > Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two > > > adjacent places in sm_statefuncs.c. > > > > > > Either for bad TSN (unlikely when everything is using "lo") > > > and bad STREAM. > > > I suspect it is the 'bad stream' case. > > > I've not double-checked but I bet the discarded packets > > > all have a large stream number. > > ... > > > > If I dump out /proc/net/sctp/assocs and look way over to the right > > (on the next monitor but 1) there are two columns INS and OUTS. > > I've just realised that these are the number of streams. > > Now all my connections are loopback - so I see both sockets for each. > > So I'd expect the INS to match the OUTS of the peer. > > This isn't true. > > When the value should be negotiated down the OUTS value is unchanged. > > So the kernel is sending packets with illegal stream numbers. > > These are acked and then silently discarded. > did it do addstream reconfig or receive any duplicate COOKIE-ECHO in your case? Extremely unlikely. Looking at the latest version of my driver code (which I wasn't using) I wrote the following: * Since the code that negotiates the number of streams got broken * in version 5.1 we need to extract the correct value from the * internal structures to avoid SCTP sending messages the remote * system will discard. /* stream.outcnt is the value we should be using. * But kernels 5.1 to 5.8 fail to reduce it based on the number * received from the remote system. * So bound here so that transmitted messages don't get discarded. */ outcnt = asoc->stream.outcnt; num_ostreams = asoc->c.sinit_num_ostreams; I think there was a patch done for 5.9. It needs back-porting. Although Andreas said 5.4 worked for him. So maybe he has a different problem. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)