On Thu, Dec 14, 2017 at 2:30 PM, Ashok Kumar <svashok79@xxxxxxxxx> wrote: > Neil / Xin, > > The best way is to change the LKSTCP kernel code to handle this > situation and stop sending SCTP abort message? > > Can you please give guidance on where to change the code? If it was ABORT packet generate by app crash, try this: diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 1b00a1e..6cc245a 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1526,7 +1526,7 @@ static void sctp_close(struct sock *sk, long timeout) (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime)) { struct sctp_chunk *chunk; - chunk = sctp_make_abort_user(asoc, NULL, 0); + chunk = NULL; /* sctp_make_abort_user(asoc, NULL, 0); */ sctp_primitive_ABORT(net, asoc, chunk); } else sctp_primitive_SHUTDOWN(net, asoc, NULL); > > Thanks, > Ashok > > > On Wed, Dec 13, 2017 at 5:52 PM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote: >> On Wed, Dec 13, 2017 at 02:58:34PM +0800, Xin Long wrote: >>> On Wed, Dec 13, 2017 at 12:50 PM, Ashok Kumar <svashok79@xxxxxxxxx> wrote: >>> > Thanks Neil for the suggestion. Yes, it sounds to be a bad hack, but >>> > we will give it a try. Meanwhile, if you can think of some other >>> > solution please let me know. >>> >>> Not sure if your SCTP server app running as a systemd service, >>> if yes, just add it to the 'After =', then let systemd insert the >>> iptables rule before killing your sctp process. >>> >>> # cat /etc/systemd/system/sctp_no_abort.service >>> [Unit] >>> Description=SCTP No Abort Send When Shutdown >>> After=shutdown.target reboot.target halt.target >>> >>> [Service] >>> Type=oneshot >>> ExecStart=/bin/true >>> ExecStop=/usr/bin/bash -c "iptables -A OUTPUT -p sctp -j DROP" >>> RemainAfterExit=yes >>> >>> [Install] >>> WantedBy=multi-user.target >>> >> This would work for some packets, but those queued and sent by a timer might >> make it out. >> >> Neil >> >>> >>> >>> >>> > >>> > Thanks, >>> > Ashok >>> > >>> > On Wed, Dec 13, 2017 at 12:02 AM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote: >>> >> On Tue, Dec 12, 2017 at 10:21:31PM +0530, Ashok Kumar wrote: >>> >>> Hi, >>> >>> >>> >>> >>> >>> >>> >>> We are using LKSCTP in our LTE product (HeNBGW). We have >>> >>> high-availability support also in our product. In case of any failure >>> >>> on active VM, standby VM will take over active role and all the SCTP >>> >>> associations will be moved to that new active VM. The associations >>> >>> should be moved transparent to the peers (a kind of SCTP reset before >>> >>> SCTP heartbeat expires on the peer nodes). >>> >>> >>> >>> >>> >>> >>> >>> But the problem that we face is that when a process crashes on active >>> >>> VM, the LKSCTP stack immediately sends SCTP abort to the peers for all >>> >>> associations before the system goes down completely. This creates >>> >>> confusion with the peers. Is there any way to avoid sending SCTP abort >>> >>> message in this scenario? If yes, please let us know how to do the >>> >>> same? If it needs LKSCTP kernel code change, please give pointers on >>> >>> what and where to change. >>> >>> >>> >>> >>> >>> >>> >>> P.S: We tried to block the abort messages by dynamically using >>> >>> IPtables through signal handler (for signal 11 and 6). But this did >>> >>> not work. >>> >>> >>> >>> >>> >>> >>> >>> A quick response will be highly appreciated. >>> >>> >>> >> You're not going to be able to reliably block ABORTS, or any packet only on a >>> >> crash condition, just because the stack has points that operates asynchronously >>> >> to the process. >>> >> >>> >> About the closest thing that I could think of would be to write a custom >>> >> iptables rule to match on ABORT packets and send them to the NFQUEUE target. >>> >> Write a userspace handler process for queue targeted packets which in turn just >>> >> holds the abort packet for at least one cluster live heartbeat time (I'm >>> >> assuming here that, being a clustered system it has some sort of liveness >>> >> check). Doing this hold may allow the cluster to shift to the new vm in a >>> >> failure situation before your queue handler process releases any abort packets >>> >> that it has, while in the event there is no failover, it will just release the >>> >> abort a little late. >>> >> >>> >> I can't really recommend that approach mind you (its a horrid hack, and will >>> >> likely cause other protocol issues), but its all I can think of at the moment. >>> >> >>> >> Regards >>> >> Neil >>> >> >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Ashok >>> >>> -- >>> >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>> > the body of a message to majordomo@xxxxxxxxxxxxxxx >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html