Re: Dropped SCTP Connections over slow network

Atalay Ozgovde <aozgovde@xxxxxxxxx> · Fri, 15 May 2015 10:13:41 -0400

Makes sense. Thanks for your response.

Atalay

On Fri, May 15, 2015 at 9:39 AM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
> On Wed, May 13, 2015 at 10:27:05AM -0400, Atalay Ozgovde wrote:
>> We are essentially trying to implement "fire and forget" TCP with
>> SCTP. That is, a single server port that services clients that
>> establish separate connections. We are not broadcasting same messages
>> over to every client. Each client can issue different sets of requests
>> from the server and receive response as streams of messages over its
>> own channel. Messages are time sensitive and expire fast, therefore
>> retransmission of lost messages is not desirable. Due to large number
>> of clients and uniqueness of clients' needs UDP is not an option.
>> We have one-to-one SCTP connections to clients where we send unordered
>> messages with short time to live parameters. When clients' network
>> degrade to about 60% packet loss (which happens often in some
>> regions), server can continue to write (sctp_sendmsg) to the clients
>> connection but wireshark shows that message don't get to the wire
>> (blocked at the transport layer).
>> We enabled SCTP Kernel logging and here is what we see:
>> We start getting the following for each write attempt (some lines are
>> removed for brevity):
>> May 11 09:58:43 localhost kernel: [250220.363867] sctp:
>> sctp_outq_flush: could not transmit TSN: 0x0, status: 2
>> May 11 09:58:43 localhost kernel: [250220.363870] sctp: sctp_do_sm
>> post sfx: error 0, asoc ffff88036943a000[STATE_
>> ESTABLISHED]
>>
>> After several of the above eventually we get:
>>
>> May 11 09:58:43 localhost kernel: [250220.363871] sctp: We sent primitively.
>> May 11 09:58:43 localhost kernel: [250220.363933] sctp: sctp_close(sk:
>> 0xffff880369693dc0, timeout:0)
>> May 11 09:58:43 localhost kernel: [250220.363938] sctp: sctp_do_sm
>> prefn: ep ffff8807d159e200, EVENT_T_PRIMITIVE, PRIMITIVE_SHUTDOWN,
>> asoc ffff880         36943a000[STATE_ESTABLISHED],
>> sctp_sf_do_9_2_prm_shutdown
>>  May 11 09:58:43 localhost kernel: [250220.363941] sctp: sctp_do_sm
>> postfn: asoc ffff88036943a000, status: DISPOSITION_CONSUME
>> May 11 09:58:43 localhost kernel: [250220.363943] sctp:
>> sctp_cmd_new_state: asoc ffff88036943a000[STATE_SHUTDOWN_PENDING]
>> May 11 09:58:43 localhost kernel: [250220.363945] sctp: sctp_do_sm
>> post sfx: error 0, asoc ffff88036943a000[STATE_CLOSED]
>> May 11 09:58:43 localhost kernel: [250220.363947] sctp:
>> sctp_destroy_sock(sk: ffff880369693dc0)
>>
>> SCTP is abandoning the connection due to status = 2. I found in the
>> kernel SCTP source that it means: SCTP_XMIT_RWND_FULL. ie. rwindow is
>> full. Clearly SCTP reacting to what is sees as heavy congestion.
>> We can detect congestion before connection is closed (using sctp
>> events), my question is is there a way to reset a connection
>> (association) without having to close it? Alternatively, is there a
>> way to relax congestion parameters so that we can continue using the
>> connection as we don't care about the packet loss?
>>
>> Thanks,
>>
>> Atalay
>
> Theres no way to reset a failed connection short of closing it an
> re-establishing a new one.  You also can't "relax" the rwnd congestion parameter
> directly, but you can change sctp_mem/sctp_rmem so that newly established
> connections compute a larger receive window when they are set up
> Neil
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html