Re: Lingering associations on the server side after process dies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ok yeah thanks!, I seem to be hitting this often enough by crashing
the application server, let me try to put together a minimal
reproducer
- Amar

On Thu, Dec 22, 2016 at 9:17 AM, Marcelo Ricardo Leitner
<marcelo.leitner@xxxxxxxxx> wrote:
> On Wed, Dec 21, 2016 at 11:11:46PM -0800, amar padmanabhan wrote:
>> I am trying to figure out an issue where after a process crash
>> associations are lingering.
>>
>> On the server:
>> vagrant@magma-dev:~/build/oai_sgw$ sudo cat /proc/net/sctp/assocs
>>  ASSOC     SOCK   STY SST ST HBKT ASSOC-ID TX_QUEUE RX_QUEUE UID INODE
>> LPORT RPORT LADDRS <-> RADDRS HBINT INS OUTS MAXRT T1X T2X RTXC wmema
>> wmemq sndbuf rcvbuf
>> ffff8800b58c7000 ffff8800da4d0bc0 2   1   3  0       9        0
>> 0       0     0 36412 36412  192.168.60.142 <-> *192.168.60.141
>> 7500     3     8   10    0    7        0        1        0   212992
>> 212992
>>
>> sudo lsof | grep ffff8800da4d0bc0
>>
>> after the application server restarts the client retries with INIT and
>> COOKIE_ECHO and the server replies back with INIT_ACK and COOKIE_ACK
>> without any notification to the application server and any subsequent
>> request from the client is not seen on the server.
>
> Sounds like the asoc leaked. It's lingering, maybe waiting for
> something to complete or simply leaked.
> Thus, when the client issues a new INIT, it's actually this old asoc
> that is catching and replying it, and not the new server process, so it
> doesn't/can't see the new request.
>
>>
>> Some pointers would be useful
>> 1. Why can't I see the socket in lsof.
>
> Not sure.
>
>> 2. How do I shutdown the existing association, so the server can
>> rebuild the state on the associations, and restart cleanly.
>
> We should confirm if it's really leaked, and fix the leak instead.
>
> The proc output above contains wmema: 1, probably comes from a
> sctp_packet_set_owner_w() call, indicating a sk_buff is still live and
> holding the asoc.
>
> We had similar issues in recent past but they should be fixed in the
> kernel you're using. They were related to error handling situations.
>
> Can you try to come with a minimal reproducer for this?
>
>>
>> vagrant@magma-dev:~/build/oai_sgw$ uname -a
>> Linux magma-dev 4.7.4-040704-generic #201609150330 SMP Thu Sep 15
>> 07:32:22 UTC 2016 x86_64 GNU/Linux
>>
>> vagrant@magma-dev:~/build/oai_sgw$ sudo modinfo sctp
>> filename:       /lib/modules/4.7.4-040704-generic/kernel/net/sctp/sctp.ko
>> license:        GPL
>> description:    Support for the SCTP protocol (RFC2960)
>> author:         Linux Kernel SCTP developers <linux-sctp@xxxxxxxxxxxxxxx>
>> alias:          net-pf-10-proto-132
>> alias:          net-pf-2-proto-132
>> depends:        libcrc32c
>> intree:         Y
>> vermagic:       4.7.4-040704-generic SMP mod_unload modversions
>> parm:           no_checksums:Disable checksums computing and verification (bool)
>>
>> The code can be found here:
>> https://gitlab.eurecom.fr/oai/openair-cn/blob/develop/SRC/SCTP/sctp_primitives_server.c#L352
>>
>> Thanks
>> Amar
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux