[Forwarding to linux-sctp] -------- Original Message -------- Subject: connect() issues Date: Sun, 31 Aug 2014 12:10:58 -0400 From: Jamal Hadi Salim <jhs@xxxxxxxxxxxx> To: lksctp-developers@xxxxxxxxxxxxxxxxxxxxx CC: Vlad Yasevich <vyasevic@xxxxxxxxxx>, Michael Tuexen <Michael.Tuexen@xxxxxxxxxxxxxxxxx> Folks, I have attached a small program written by Michael Tuexen and modified slightly by me to demonstrate the issue. It demonstrates memory issues due to connect(). Sorry, you will need libev.. (I had to extract details out of a large complex program). Summary: ======= There is a kernel issue where each connect() call results in sctp_association_new() where memory is allocated. An INIT goes out to remote and an ABORT comes back. But the allocated mem is never freed. I thought because i registered for association events i could get these events sent to me - but recvmsg fails every time and no readability state is set on the socket. If you run this long enough(24 hours or so) you will see the oom killer come in upset about sctp_association_new(): --- Call Trace: [<ffffffff80145508>] show_stack+0x68/0x80 [<ffffffff8061e9c8>] dump_header.isra.12+0x78/0x1ac [<ffffffff801d2358>] oom_kill_process+0x2e8/0x440 [<ffffffff801d2998>] out_of_memory+0x2b8/0x2e8 [<ffffffff801d7084>] __alloc_pages_nodemask+0x774/0x788 [<ffffffff80210c60>] cache_alloc_refill+0x470/0x7b0 [<ffffffff802107c4>] kmem_cache_alloc+0xe4/0x110 [<ffffffffc008a214>] sctp_association_new+0x54/0x688 [sctp] [<ffffffffc009c92c>] __sctp_connect+0x274/0x618 [sctp] [<ffffffffc009ce84>] sctp_connect+0x7c/0xe8 [sctp] [<ffffffff8053d030>] SyS_connect+0xd8/0xf8 [<ffffffff8014a0a4>] handle_sys64+0x44/0x68 ----- I am sorry I dont have time to chase the kernel code (and will have to work around it in user space in our code). Longer version: ============== Attached program initially tries to connect to a server which is not up yet. At some point the server comes up and all the issues i observe go away i.e resulting memory consumption goes to zero. The issue i am about to describe happens on all kernel versions i have tested on (including latest and all the way back to 2.6.32 running on a MIPS board). How to observe the issue: on xterm 1: sudo watch "cat /proc/slabinfo | grep -i ^kmalloc-" on xterm 2: run the attached program. In my laptop the pages are 4K, so i would see kmalloc-4096 consumption going up. If you want actually to narrow this down - then compile the kernel with CONFIG_SCTP_DBG_OBJCNT (or you can believe what i am saying below). do a: ---- Every 2.0s: sudo cat /proc/net/sctp/sctp_dbg_objcnt Fri Aug 29 11:34:35 2014 sock: 5 ep: 5 assoc: 279 transport: 1 chunk: 0 bind_addr: 0 bind_bucket: 3 addr: 4 ssnmap: 0 datamsg: 0 ------ And When i start the server 3-4 minutes later and the two ends talk to each other, the counters go down: --- Every 2.0s: sudo cat /proc/net/sctp/sctp_dbg_objcnt Fri Aug 29 11:37:38 2014 sock: 12 ep: 12 assoc: 6 transport: 6 chunk: 0 bind_addr: 0 bind_bucket: 7 addr: 16 ssnmap: 6 datamsg: 0 ------------- cheers, jamal
/* * gcc connect_test.c -lev */ /*- * Copyright (c) 2014 Michael Tuexen * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/sctp.h> #include <arpa/inet.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <ev.h> #include <errno.h> #define PORT 9 #define FEAPP_PORT 30330 #define ADDR "127.0.0.1" struct sockaddr_in addr; int fd; /* the timer being fired seems to help creating the issue */ void timeout_cb (EV_P_ ev_timer *w, int revents) { int flags; ssize_t rc; flags = fcntl(fd, F_GETFL, 0); if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) < 0) { perror("fcntl"); } if (connect(fd, (const struct sockaddr *)&addr, sizeof(struct sockaddr_in)) < 0) { perror("connect"); } w->repeat = 0.1; ev_timer_again(EV_A_ w); } int main(void) { int i, rc; int tr; struct sctp_event_subscribe event; ev_timer timeout_watcher; struct ev_loop *loop = EV_DEFAULT; if ((fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP)) < 0) { perror("socket"); } if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &tr, sizeof(int)) < 0) { perror("SO_REUSEADDR"); return -1; } memset(&event, 0, sizeof(event)); event.sctp_association_event = 1; /*XXX: if you dont subscribe to events all goes well...*/ rc = setsockopt(fd, IPPROTO_SCTP, SCTP_EVENTS, &event, sizeof(event)); memset(&addr, 0, sizeof(struct sockaddr_in)); addr.sin_family = AF_INET; #if defined(__FreeBSD__) || defined(__APPLE__) addr.sin_len = sizeof(struct sockaddr_in); #endif addr.sin_port = htons(PORT); addr.sin_addr.s_addr = inet_addr(ADDR); /* run the callback every 0.1 seconds */ ev_init (&timeout_watcher, timeout_cb); timeout_watcher.repeat = 0.1; ev_timer_start (loop, &timeout_watcher); ev_run (loop, 0); if (close(fd) < 0) { perror("close"); } return (0); }