Re: Fwd: connect() issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/16/14 17:27, Vlad Yasevich wrote:

Hi Jamal

I finally was able to dig into this issue some more and here
is what I've found.


Thanks for looking Vlad.

I am not sure why you say above the recvmsg() failed for you, but
for me it worked correctly.There was a notification sitting
on the socket queue and select() call would trigger when called.


Didnt work for me - wonder if libev is expecting something of me.
The sample posted code: If you do a recvmsg on the timer callback,
that would be close to what i did.

So for the reproducer you've provided we have 2 leaks:
  1) For every failed connect, you will have a notification sitting
on the socket queue.  The more connects failed, the more notifications
you'll have.
  2) Every notification holds a reference on the association that generated
it.  As long as notifications are queued, the old associations will
remain in memory.


nod.

What makes the above condition really bad is that notifications don't appear
to be checked against the socket receive buffer or the sctp_rmem variables.
As such, you can very easily exhaust memory by generating a ton of notifications.

Yes. It would have been helpful debugging this if it got tied to the
process.

I am working on the patch to fix both of the above issues.  We will not
be able too much about queuing notifications, but we'll at least be able
to limit them to either socket receive buffer or sctp_rmem whichever is
smaller.  If you have a program that just calls connect in a loop with
notifications enabled, the app will eventually run out of receive buffer space
if it doesn't drain the notifications.

I think that is reasonable. Only need to solve the mystery of why i saw
nothing on recvmsg.
Is it possible to emulate what TCP does?
e.g associate related connects instead of creating new associations?
That way very little memory is used and i dont get "in progress" code every time when that last connect just failed.

As for associations, we can drop the reference from the notification thus
allowing the memory for the association to actually go away.

Ok.

cheers,
jamal

-vlad


If you run this long enough(24 hours or so) you will see the oom
killer come in upset about sctp_association_new():

---
Call Trace:
[<ffffffff80145508>] show_stack+0x68/0x80
[<ffffffff8061e9c8>] dump_header.isra.12+0x78/0x1ac
[<ffffffff801d2358>] oom_kill_process+0x2e8/0x440
[<ffffffff801d2998>] out_of_memory+0x2b8/0x2e8
[<ffffffff801d7084>] __alloc_pages_nodemask+0x774/0x788
[<ffffffff80210c60>] cache_alloc_refill+0x470/0x7b0
[<ffffffff802107c4>] kmem_cache_alloc+0xe4/0x110
[<ffffffffc008a214>] sctp_association_new+0x54/0x688 [sctp]
[<ffffffffc009c92c>] __sctp_connect+0x274/0x618 [sctp]
[<ffffffffc009ce84>] sctp_connect+0x7c/0xe8 [sctp]
[<ffffffff8053d030>] SyS_connect+0xd8/0xf8
[<ffffffff8014a0a4>] handle_sys64+0x44/0x68
-----

I am sorry I dont have time to chase the kernel code
(and will have to work around it in user space in our code).

Longer version:
==============

Attached program initially tries to connect to a server which is not up
yet. At some point the server comes up and all the issues i observe
go away i.e resulting memory consumption goes to zero.

The issue i am about to describe happens on all kernel versions i have
tested on (including latest and all the way back to 2.6.32 running on
a MIPS board).

How to observe the issue:
on xterm 1:
sudo watch "cat /proc/slabinfo | grep -i ^kmalloc-"

on xterm 2:
run the attached program.

In my laptop the pages are 4K, so i would see kmalloc-4096 consumption
going up.

If you want actually to narrow this down - then compile the kernel with
CONFIG_SCTP_DBG_OBJCNT (or you can believe what i am saying below).
do a:

----
Every 2.0s: sudo cat /proc/net/sctp/sctp_dbg_objcnt     Fri Aug 29
11:34:35 2014
sock: 5
ep: 5
assoc: 279
transport: 1
chunk: 0
bind_addr: 0
bind_bucket: 3
addr: 4
ssnmap: 0
datamsg: 0
------

And

When i start the server 3-4 minutes later and the two ends talk to each
other,
the counters go down:

---
Every 2.0s: sudo cat /proc/net/sctp/sctp_dbg_objcnt     Fri Aug 29
11:37:38 2014
sock: 12
ep: 12
assoc: 6
transport: 6
chunk: 0
bind_addr: 0
bind_bucket: 7
addr: 16
ssnmap: 6
datamsg: 0
-------------

cheers,
jamal







--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux