Re: Dropped SCTP Connections over slow network

Neil Horman <nhorman@xxxxxxxxxxxxx> · Fri, 15 May 2015 09:39:09 -0400

On Wed, May 13, 2015 at 10:27:05AM -0400, Atalay Ozgovde wrote:
> We are essentially trying to implement "fire and forget" TCP with
> SCTP. That is, a single server port that services clients that
> establish separate connections. We are not broadcasting same messages
> over to every client. Each client can issue different sets of requests
> from the server and receive response as streams of messages over its
> own channel. Messages are time sensitive and expire fast, therefore
> retransmission of lost messages is not desirable. Due to large number
> of clients and uniqueness of clients' needs UDP is not an option.
> We have one-to-one SCTP connections to clients where we send unordered
> messages with short time to live parameters. When clients' network
> degrade to about 60% packet loss (which happens often in some
> regions), server can continue to write (sctp_sendmsg) to the clients
> connection but wireshark shows that message don't get to the wire
> (blocked at the transport layer).
> We enabled SCTP Kernel logging and here is what we see:
> We start getting the following for each write attempt (some lines are
> removed for brevity):
> May 11 09:58:43 localhost kernel: [250220.363867] sctp:
> sctp_outq_flush: could not transmit TSN: 0x0, status: 2
> May 11 09:58:43 localhost kernel: [250220.363870] sctp: sctp_do_sm
> post sfx: error 0, asoc ffff88036943a000[STATE_
> ESTABLISHED]
> 
> After several of the above eventually we get:
> 
> May 11 09:58:43 localhost kernel: [250220.363871] sctp: We sent primitively.
> May 11 09:58:43 localhost kernel: [250220.363933] sctp: sctp_close(sk:
> 0xffff880369693dc0, timeout:0)
> May 11 09:58:43 localhost kernel: [250220.363938] sctp: sctp_do_sm
> prefn: ep ffff8807d159e200, EVENT_T_PRIMITIVE, PRIMITIVE_SHUTDOWN,
> asoc ffff880         36943a000[STATE_ESTABLISHED],
> sctp_sf_do_9_2_prm_shutdown
>  May 11 09:58:43 localhost kernel: [250220.363941] sctp: sctp_do_sm
> postfn: asoc ffff88036943a000, status: DISPOSITION_CONSUME
> May 11 09:58:43 localhost kernel: [250220.363943] sctp:
> sctp_cmd_new_state: asoc ffff88036943a000[STATE_SHUTDOWN_PENDING]
> May 11 09:58:43 localhost kernel: [250220.363945] sctp: sctp_do_sm
> post sfx: error 0, asoc ffff88036943a000[STATE_CLOSED]
> May 11 09:58:43 localhost kernel: [250220.363947] sctp:
> sctp_destroy_sock(sk: ffff880369693dc0)
> 
> SCTP is abandoning the connection due to status = 2. I found in the
> kernel SCTP source that it means: SCTP_XMIT_RWND_FULL. ie. rwindow is
> full. Clearly SCTP reacting to what is sees as heavy congestion.
> We can detect congestion before connection is closed (using sctp
> events), my question is is there a way to reset a connection
> (association) without having to close it? Alternatively, is there a
> way to relax congestion parameters so that we can continue using the
> connection as we don't care about the packet loss?
> 
> Thanks,
> 
> Atalay

Theres no way to reset a failed connection short of closing it an
re-establishing a new one.  You also can't "relax" the rwnd congestion parameter
directly, but you can change sctp_mem/sctp_rmem so that newly established
connections compute a larger receive window when they are set up
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html