Re: How to restrict SCTP abort during a process crash

Xin Long <lucien.xin@xxxxxxxxx> · Thu, 14 Dec 2017 17:22:24 +0800

On Thu, Dec 14, 2017 at 2:30 PM, Ashok Kumar <svashok79@xxxxxxxxx> wrote:
> Neil / Xin,
>
> The best way is to change the LKSTCP kernel code to handle this
> situation and stop sending SCTP abort message?
>
> Can you please give guidance on where to change the code?

If it was ABORT packet generate by app crash, try this:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1b00a1e..6cc245a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1526,7 +1526,7 @@ static void sctp_close(struct sock *sk, long timeout)
                    (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime)) {
                        struct sctp_chunk *chunk;

-                       chunk = sctp_make_abort_user(asoc, NULL, 0);
+                       chunk = NULL; /* sctp_make_abort_user(asoc, NULL, 0); */
                        sctp_primitive_ABORT(net, asoc, chunk);
                } else
                        sctp_primitive_SHUTDOWN(net, asoc, NULL);



>
> Thanks,
> Ashok
>
>
> On Wed, Dec 13, 2017 at 5:52 PM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
>> On Wed, Dec 13, 2017 at 02:58:34PM +0800, Xin Long wrote:
>>> On Wed, Dec 13, 2017 at 12:50 PM, Ashok Kumar <svashok79@xxxxxxxxx> wrote:
>>> > Thanks Neil for the suggestion. Yes, it sounds to be a bad hack, but
>>> > we will give it a try. Meanwhile, if you can think of some other
>>> > solution please let me know.
>>>
>>> Not sure if your SCTP server app running as a systemd service,
>>> if yes, just add it to the 'After =', then let systemd insert the
>>> iptables rule before killing your sctp process.
>>>
>>> # cat /etc/systemd/system/sctp_no_abort.service
>>> [Unit]
>>> Description=SCTP No Abort Send When Shutdown
>>> After=shutdown.target reboot.target halt.target
>>>
>>> [Service]
>>> Type=oneshot
>>> ExecStart=/bin/true
>>> ExecStop=/usr/bin/bash -c "iptables -A OUTPUT -p sctp -j DROP"
>>> RemainAfterExit=yes
>>>
>>> [Install]
>>> WantedBy=multi-user.target
>>>
>> This would work for some packets, but those queued and sent by a timer might
>> make it out.
>>
>> Neil
>>
>>>
>>>
>>>
>>> >
>>> > Thanks,
>>> > Ashok
>>> >
>>> > On Wed, Dec 13, 2017 at 12:02 AM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
>>> >> On Tue, Dec 12, 2017 at 10:21:31PM +0530, Ashok Kumar wrote:
>>> >>> Hi,
>>> >>>
>>> >>>
>>> >>>
>>> >>> We are using LKSCTP in our LTE product (HeNBGW). We have
>>> >>> high-availability support also in our product. In case of any failure
>>> >>> on active VM, standby VM will take over active role and all the SCTP
>>> >>> associations will be moved to that new active VM. The associations
>>> >>> should be moved transparent to the peers (a kind of SCTP reset before
>>> >>> SCTP heartbeat expires on the peer nodes).
>>> >>>
>>> >>>
>>> >>>
>>> >>> But the problem that we face is that when a process crashes on active
>>> >>> VM, the LKSCTP stack immediately sends SCTP abort to the peers for all
>>> >>> associations before the system goes down completely. This creates
>>> >>> confusion with the peers. Is there any way to avoid sending SCTP abort
>>> >>> message in this scenario? If yes, please let us know how to do the
>>> >>> same? If it needs LKSCTP kernel code change, please give pointers on
>>> >>> what and where to change.
>>> >>>
>>> >>>
>>> >>>
>>> >>> P.S: We tried to block the abort messages by dynamically using
>>> >>> IPtables through signal handler (for signal 11 and 6). But this did
>>> >>> not work.
>>> >>>
>>> >>>
>>> >>>
>>> >>> A quick response will be highly appreciated.
>>> >>>
>>> >> You're not going to be able to reliably block ABORTS, or any packet only on a
>>> >> crash condition, just because the stack has points that operates asynchronously
>>> >> to the process.
>>> >>
>>> >> About the closest thing that I could think of would be to write a custom
>>> >> iptables rule to match on ABORT packets and send them to the NFQUEUE target.
>>> >> Write a userspace handler process for queue targeted packets which in turn just
>>> >> holds the abort packet for at least one cluster live heartbeat time (I'm
>>> >> assuming here that, being a clustered system it has some sort of liveness
>>> >> check).  Doing this hold may allow the cluster to shift to the new vm in a
>>> >> failure situation before your queue handler process releases any abort packets
>>> >> that it has, while in the event there is no failover, it will just release the
>>> >> abort a little late.
>>> >>
>>> >> I can't really recommend that approach mind you (its a horrid hack, and will
>>> >> likely cause other protocol issues), but its all I can think of at the moment.
>>> >>
>>> >> Regards
>>> >> Neil
>>> >>
>>> >>>
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>> Ashok
>>> >>> --
>>> >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >>>
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html