Re: Fwd: connect() issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/02/2014 09:39 AM, Vlad Yasevich wrote:
> 
> [Forwarding to linux-sctp]
> 
> -------- Original Message --------
> Subject: connect() issues
> Date: Sun, 31 Aug 2014 12:10:58 -0400
> From: Jamal Hadi Salim <jhs@xxxxxxxxxxxx>
> To: lksctp-developers@xxxxxxxxxxxxxxxxxxxxx
> CC: Vlad Yasevich <vyasevic@xxxxxxxxxx>,        Michael Tuexen
> <Michael.Tuexen@xxxxxxxxxxxxxxxxx>
> 
> Folks,
> 
> I have attached a small program written by Michael Tuexen and modified
> slightly by me to demonstrate the issue. It demonstrates memory issues
> due to connect(). Sorry, you will need libev..
> (I had to extract details out of a large complex program).
> 
> Summary:
> =======
> There is a kernel issue where each connect() call results in
> sctp_association_new() where memory is allocated. An INIT goes
> out to remote and an ABORT comes back. But the allocated mem
> is never freed. I thought because i registered for association
> events i could get these events sent to me - but recvmsg fails
> every time and no readability state is set on the socket.

Hi Jamal

I finally was able to dig into this issue some more and here
is what I've found.

I am not sure why you say above the recvmsg() failed for you, but
for me it worked correctly.  There was a notification sitting
on the socket queue and select() call would trigger when called.

So for the reproducer you've provided we have 2 leaks:
 1) For every failed connect, you will have a notification sitting
on the socket queue.  The more connects failed, the more notifications
you'll have.
 2) Every notification holds a reference on the association that generated
it.  As long as notifications are queued, the old associations will
remain in memory.

What makes the above condition really bad is that notifications don't appear
to be checked against the socket receive buffer or the sctp_rmem variables.
As such, you can very easily exhaust memory by generating a ton of notifications.

I am working on the patch to fix both of the above issues.  We will not
be able too much about queuing notifications, but we'll at least be able
to limit them to either socket receive buffer or sctp_rmem whichever is
smaller.  If you have a program that just calls connect in a loop with
notifications enabled, the app will eventually run out of receive buffer space
if it doesn't drain the notifications.
As for associations, we can drop the reference from the notification thus
allowing the memory for the association to actually go away.

-vlad

> 
> If you run this long enough(24 hours or so) you will see the oom
> killer come in upset about sctp_association_new():
> 
> ---
> Call Trace:
> [<ffffffff80145508>] show_stack+0x68/0x80
> [<ffffffff8061e9c8>] dump_header.isra.12+0x78/0x1ac
> [<ffffffff801d2358>] oom_kill_process+0x2e8/0x440
> [<ffffffff801d2998>] out_of_memory+0x2b8/0x2e8
> [<ffffffff801d7084>] __alloc_pages_nodemask+0x774/0x788
> [<ffffffff80210c60>] cache_alloc_refill+0x470/0x7b0
> [<ffffffff802107c4>] kmem_cache_alloc+0xe4/0x110
> [<ffffffffc008a214>] sctp_association_new+0x54/0x688 [sctp]
> [<ffffffffc009c92c>] __sctp_connect+0x274/0x618 [sctp]
> [<ffffffffc009ce84>] sctp_connect+0x7c/0xe8 [sctp]
> [<ffffffff8053d030>] SyS_connect+0xd8/0xf8
> [<ffffffff8014a0a4>] handle_sys64+0x44/0x68
> -----
> 
> I am sorry I dont have time to chase the kernel code
> (and will have to work around it in user space in our code).
> 
> Longer version:
> ==============
> 
> Attached program initially tries to connect to a server which is not up
> yet. At some point the server comes up and all the issues i observe
> go away i.e resulting memory consumption goes to zero.
> 
> The issue i am about to describe happens on all kernel versions i have
> tested on (including latest and all the way back to 2.6.32 running on
> a MIPS board).
> 
> How to observe the issue:
> on xterm 1:
> sudo watch "cat /proc/slabinfo | grep -i ^kmalloc-"
> 
> on xterm 2:
> run the attached program.
> 
> In my laptop the pages are 4K, so i would see kmalloc-4096 consumption
> going up.
> 
> If you want actually to narrow this down - then compile the kernel with
> CONFIG_SCTP_DBG_OBJCNT (or you can believe what i am saying below).
> do a:
> 
> ----
> Every 2.0s: sudo cat /proc/net/sctp/sctp_dbg_objcnt     Fri Aug 29
> 11:34:35 2014
> sock: 5
> ep: 5
> assoc: 279
> transport: 1
> chunk: 0
> bind_addr: 0
> bind_bucket: 3
> addr: 4
> ssnmap: 0
> datamsg: 0
> ------
> 
> And
> 
> When i start the server 3-4 minutes later and the two ends talk to each
> other,
> the counters go down:
> 
> ---
> Every 2.0s: sudo cat /proc/net/sctp/sctp_dbg_objcnt     Fri Aug 29
> 11:37:38 2014
> sock: 12
> ep: 12
> assoc: 6
> transport: 6
> chunk: 0
> bind_addr: 0
> bind_bucket: 7
> addr: 16
> ssnmap: 6
> datamsg: 0
> -------------
> 
> cheers,
> jamal
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux