Hello cyrus users, I have a Cyrus-imapd server with 2400 mailboxes imap accessed by Open-Xchange client. Days ago this Cyrus-Imapd server experiences an Out of Memory and it starts to sacrifice childs: 2014-03-04T15:25:48.927562+01:00 ucstore-csi kernel: imapd: page allocation failure. order:1, mode:0x20 2014-03-04T15:25:48.934114+01:00 ucstore-csi kernel: Pid: 18151, comm: imapd Not tainted 2.6.32-279.el6.x86_64 #1 2014-03-04T15:25:48.934118+01:00 ucstore-csi kernel: Call Trace: 2014-03-04T15:25:48.934122+01:00 ucstore-csi kernel: <IRQ> [<ffffffff8112759f>] ? __alloc_pages_nodemask+0x77f/0x940 2014-03-04T15:25:48.934123+01:00 ucstore-csi kernel: [<ffffffff81161d62>] ? kmem_getpages+0x62/0x170 2014-03-04T15:25:48.934123+01:00 ucstore-csi kernel: [<ffffffff8116297a>] ? fallback_alloc+0x1ba/0x270 2014-03-04T15:25:48.934124+01:00 ucstore-csi kernel: [<ffffffff811623cf>] ? cache_grow+0x2cf/0x320 2014-03-04T15:25:48.934124+01:00 ucstore-csi kernel: [<ffffffff811626f9>] ? ____cache_alloc_node+0x99/0x160 2014-03-04T15:25:48.934125+01:00 ucstore-csi kernel: [<ffffffff811634db>] ? kmem_cache_alloc+0x11b/0x190 2014-03-04T15:25:48.934125+01:00 ucstore-csi kernel: [<ffffffff8142dc68>] ? sk_prot_alloc+0x48/0x1c0 2014-03-04T15:25:48.934127+01:00 ucstore-csi kernel: [<ffffffff8142df32>] ? sk_clone+0x22/0x2e0 2014-03-04T15:25:48.934128+01:00 ucstore-csi kernel: [<ffffffff8147bb86>] ? inet_csk_clone+0x16/0xd0 2014-03-04T15:25:48.934128+01:00 ucstore-csi kernel: [<ffffffff81494ae3>] ? tcp_create_openreq_child+0x23/0x450 2014-03-04T15:25:48.934129+01:00 ucstore-csi kernel: [<ffffffff8149239d>] ? tcp_v4_syn_recv_sock+0x4d/0x310 2014-03-04T15:25:48.934129+01:00 ucstore-csi kernel: [<ffffffff81494886>] ? tcp_check_req+0x226/0x460 2014-03-04T15:25:48.934130+01:00 ucstore-csi kernel: [<ffffffff81491dbb>] ? tcp_v4_do_rcv+0x35b/0x430 2014-03-04T15:25:48.934132+01:00 ucstore-csi kernel: [<ffffffff814935be>] ? tcp_v4_rcv+0x4fe/0x8d0 2014-03-04T15:25:48.934133+01:00 ucstore-csi kernel: [<ffffffff811acdd7>] ? end_bio_bh_io_sync+0x37/0x60 2014-03-04T15:25:48.934133+01:00 ucstore-csi kernel: [<ffffffff814712dd>] ? ip_local_deliver_finish+0xdd/0x2d0 2014-03-04T15:25:48.934134+01:00 ucstore-csi kernel: [<ffffffff81471568>] ? ip_local_deliver+0x98/0xa0 2014-03-04T15:25:48.934134+01:00 ucstore-csi kernel: [<ffffffff81470a2d>] ? ip_rcv_finish+0x12d/0x440 2014-03-04T15:25:48.934135+01:00 ucstore-csi kernel: [<ffffffff81470fb5>] ? ip_rcv+0x275/0x350 2014-03-04T15:25:48.934135+01:00 ucstore-csi kernel: [<ffffffff8143a7bb>] ? __netif_receive_skb+0x49b/0x6f0 2014-03-04T15:25:48.934137+01:00 ucstore-csi kernel: [<ffffffff8143ca38>] ? netif_receive_skb+0x58/0x60 2014-03-04T15:25:48.934138+01:00 ucstore-csi kernel: [<ffffffffa00aea9d>] ? vmxnet3_rq_rx_complete+0x36d/0x880 [vmxnet3] 2014-03-04T15:25:48.934138+01:00 ucstore-csi kernel: [<ffffffff812871e0>] ? swiotlb_map_page+0x0/0x100 2014-03-04T15:25:48.934139+01:00 ucstore-csi kernel: [<ffffffffa00af203>] ? vmxnet3_poll_rx_only+0x43/0xc0 [vmxnet3] 2014-03-04T15:25:48.934139+01:00 ucstore-csi kernel: [<ffffffff8143f193>] ? net_rx_action+0x103/0x2f0 2014-03-04T15:25:48.934140+01:00 ucstore-csi kernel: [<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0 2014-03-04T15:25:48.934140+01:00 ucstore-csi kernel: [<ffffffff810db800>] ? handle_IRQ_event+0x60/0x170 2014-03-04T15:25:48.934142+01:00 ucstore-csi kernel: [<ffffffff81073f1f>] ? __do_softirq+0x11f/0x1e0 2014-03-04T15:25:48.934143+01:00 ucstore-csi kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 2014-03-04T15:25:48.934143+01:00 ucstore-csi kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 2014-03-04T15:25:48.934144+01:00 ucstore-csi kernel: [<ffffffff81073ca5>] ? irq_exit+0x85/0x90 2014-03-04T15:25:48.934144+01:00 ucstore-csi kernel: [<ffffffff81505af5>] ? do_IRQ+0x75/0xf0 2014-03-04T15:25:48.934145+01:00 ucstore-csi kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11 2014-03-05T15:38:32.815336+01:00 ucstore-csi kernel: Out of memory: Kill process 1778 (irqbalance) score 1 or sacrifice child 2014-03-05T15:38:32.815336+01:00 ucstore-csi kernel: Killed process 1778, UID 0, (irqbalance) total-vm:9140kB, anon-rss:88kB, file-rss:4 kB 2014-03-05T15:38:32.815338+01:00 ucstore-csi kernel: imapd invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0 2014-03-05T15:38:32.815339+01:00 ucstore-csi kernel: imapd cpuset=/ mems_allowed=0 2014-03-05T15:38:32.815339+01:00 ucstore-csi kernel: Pid: 19228, comm: imapd Not tainted 2.6.32-279.el6.x86_64 #1 2014-03-05T15:38:32.815340+01:00 ucstore-csi kernel: Call Trace: 2014-03-05T15:38:32.815340+01:00 ucstore-csi kernel: [<ffffffff810c4971>] ? cpuset_print_task_mems_allowed+0x91/0xb0 2014-03-05T15:38:32.815341+01:00 ucstore-csi kernel: [<ffffffff811170e0>] ? dump_header+0x90/0x1b0 2014-03-05T15:38:32.815341+01:00 ucstore-csi kernel: [<ffffffff812146fc>] ? security_real_capable_noaudit+0x3c/0x70 2014-03-05T15:38:32.815343+01:00 ucstore-csi kernel: [<ffffffff81117562>] ? oom_kill_process+0x82/0x2a0 2014-03-05T15:38:32.815344+01:00 ucstore-csi kernel: [<ffffffff811174a1>] ? select_bad_process+0xe1/0x120 2014-03-05T15:38:32.815344+01:00 ucstore-csi kernel: [<ffffffff811179a0>] ? out_of_memory+0x220/0x3c0 2014-03-05T15:38:32.815345+01:00 ucstore-csi kernel: [<ffffffff811276be>] ? __alloc_pages_nodemask+0x89e/0x940 2014-03-05T15:38:32.815345+01:00 ucstore-csi kernel: [<ffffffff8115c1da>] ? alloc_pages_current+0xaa/0x110 2014-03-05T15:38:32.815346+01:00 ucstore-csi kernel: [<ffffffff811253ce>] ? __get_free_pages+0xe/0x50 2014-03-05T15:38:32.815346+01:00 ucstore-csi kernel: [<ffffffff81069464>] ? copy_process+0xe4/0x13c0 2014-03-05T15:38:32.815348+01:00 ucstore-csi kernel: [<ffffffff8104452c>] ? __do_page_fault+0x1ec/0x480 2014-03-05T15:38:32.815349+01:00 ucstore-csi kernel: [<ffffffff812718b1>] ? cpumask_any_but+0x31/0x50 2014-03-05T15:38:32.815349+01:00 ucstore-csi kernel: [<ffffffff8106a7d4>] ? do_fork+0x94/0x460 2014-03-05T15:38:32.815350+01:00 ucstore-csi kernel: [<ffffffff81081ba1>] ? do_sigaction+0x91/0x1d0 2014-03-05T15:38:32.815350+01:00 ucstore-csi kernel: [<ffffffff810d69e2>] ? audit_syscall_entry+0x272/0x2a0 2014-03-05T15:38:32.815351+01:00 ucstore-csi kernel: [<ffffffff81009598>] ? sys_clone+0x28/0x30 2014-03-05T15:38:32.815351+01:00 ucstore-csi kernel: [<ffffffff8100b413>] ? stub_clone+0x13/0x20 2014-03-05T15:38:32.815353+01:00 ucstore-csi kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b 2014-03-05T15:38:32.815354+01:00 ucstore-csi kernel: Mem-Info: 2014-03-05T15:38:32.815354+01:00 ucstore-csi kernel: Node 0 DMA per-cpu: 2014-03-05T15:38:32.815355+01:00 ucstore-csi kernel: CPU 0: hi: 0, btch: 1 usd: 0 2014-03-05T15:38:32.815355+01:00 ucstore-csi kernel: CPU 1: hi: 0, btch: 1 usd: 0 2014-03-05T15:38:32.815356+01:00 ucstore-csi kernel: Node 0 DMA32 per-cpu: 2014-03-05T15:38:32.815356+01:00 ucstore-csi kernel: CPU 0: hi: 186, btch: 31 usd: 0 2014-03-05T15:38:32.815358+01:00 ucstore-csi kernel: CPU 1: hi: 186, btch: 31 usd: 0 2014-03-05T15:38:32.815358+01:00 ucstore-csi kernel: Node 0 Normal per-cpu: 2014-03-05T15:38:32.815359+01:00 ucstore-csi kernel: CPU 0: hi: 186, btch: 31 usd: 0 2014-03-05T15:38:32.815359+01:00 ucstore-csi kernel: CPU 1: hi: 186, btch: 31 usd: 9 2014-03-05T15:38:32.815360+01:00 ucstore-csi kernel: active_anon:1076363 inactive_anon:208842 isolated_anon:14 2014-03-05T15:38:32.815360+01:00 ucstore-csi kernel: active_file:128 inactive_file:422 isolated_file:15 2014-03-05T15:38:32.815362+01:00 ucstore-csi kernel: unevictable:0 dirty:0 writeback:0 unstable:0 2014-03-05T15:38:32.815363+01:00 ucstore-csi kernel: free:148958 slab_reclaimable:29256 slab_unreclaimable:148642 2014-03-05T15:38:32.815363+01:00 ucstore-csi kernel: mapped:862 shmem:3101 pagetables:329229 bounce:0 2014-03-05T15:38:32.815364+01:00 ucstore-csi kernel: Node 0 DMA free:15660kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15268kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes 2014-03-05T15:38:32.815364+01:00 ucstore-csi kernel: lowmem_reserve[]: 0 3000 8050 8050 2014-03-05T15:38:32.815365+01:00 ucstore-csi kernel: Node 0 DMA32 free:525368kB min:25140kB low:31424kB high:37708kB active_anon:1410856kB inactive_anon:352760kB active_file:0kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:2288kB shmem:8380kB slab_reclaimable:44624kB slab_unreclaimable:155772kB kernel_stack:46984kB pagetables:242404kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 2014-03-05T15:38:32.815367+01:00 ucstore-csi kernel: lowmem_reserve[]: 0 0 5050 5050 2014-03-05T15:38:32.815367+01:00 ucstore-csi kernel: Node 0 Normal free:54804kB min:42316kB low:52892kB high:63472kB active_anon:2894596kB inactive_anon:482608kB active_file:512kB inactive_file:1644kB unevictable:0kB isolated(anon):76kB isolated(file):60kB present:5171200kB mlocked:0kB dirty:0kB writeback:0kB mapped:1160kB shmem:4024kB slab_reclaimable:72400kB slab_unreclaimable:438796kB kernel_stack:4496kB pagetables:1074512kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:338 all_unreclaimable? no 2014-03-05T15:38:32.815368+01:00 ucstore-csi kernel: lowmem_reserve[]: 0 0 0 0 2014-03-05T15:38:32.815368+01:00 ucstore-csi kernel: Node 0 DMA: 1*4kB 1*8kB 0*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15660kB 2014-03-05T15:38:32.815369+01:00 ucstore-csi kernel: Node 0 DMA32: 128262*4kB 978*8kB 78*16kB 29*32kB 6*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 525480kB 2014-03-05T15:38:32.815370+01:00 ucstore-csi kernel: Node 0 Normal: 12511*4kB 8*8kB 39*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 54828kB 2014-03-05T15:38:32.815373+01:00 ucstore-csi kernel: 13536 total pagecache pages 2014-03-05T15:38:32.815373+01:00 ucstore-csi kernel: 9873 pages in swap cache 2014-03-05T15:38:32.815374+01:00 ucstore-csi kernel: Swap cache stats: add 1527415, delete 1517542, find 37369649/37407554 2014-03-05T15:38:32.815374+01:00 ucstore-csi kernel: Free swap = 0kB 2014-03-05T15:38:32.815375+01:00 ucstore-csi kernel: Total swap = 4194296kB 2014-03-05T15:38:32.815375+01:00 ucstore-csi kernel: 2097136 pages RAM 2014-03-05T15:38:32.815377+01:00 ucstore-csi kernel: 81706 pages reserved 2014-03-05T15:38:32.815377+01:00 ucstore-csi kernel: 18873 pages shared 2014-03-05T15:38:32.815378+01:00 ucstore-csi kernel: 1847951 pages non-shared In average I have about 400 max simoultaneous connections and I have no problems on memory. I think a network issue (DNS or LDAP stalled) causes connections to suddenly increase to 3500. Imapd processes were opened until memory overflow. My server is: Red Hat Enterprise Linux Server release 6.3 (Santiago) Without problems I read something like this: total used free shared buffers cached Mem: 8061976 7651020 410956 0 1355964 3412788 -/+ buffers/cache: 2882268 5179708 Swap: 4194296 32180 4162116 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 32180 386880 1356476 3423712 0 0 643 327 25 18 10 4 81 5 0 current cyrus.conf: SERVICES { # add or remove based on preferences imap cmd="imapd" listen="imap" prefork=5 pop3 cmd="pop3d" listen="pop3" prefork=3 sieve cmd="timsieved" listen="sieve" prefork=0 lmtp cmd="lmtpd -a" listen="lmtp" prefork=0 } I have to prevent memory issue when some oddity forces clients to make DOS on Cyrus. So I would like to configure the maxchild cyrus parameter for imap. I would like to set this value to avoid memory issue during normal work, having a known value of system RAM. I see that an IMAPD process takes in average 22-25MB. With 8GB RAM, the server would swap already with less than 400 conns; it not happens, so this evaluation is wrong or too many conservative. I think that I better consider differences between RSS and SHR memory to tuning imapd processes number, but I'm not sure. Could you help me in this tuning? In particular I'm interested on relation between memory usage and maxchild imapd processes. Meanwhile I would also tune the maxfds parameter. With lsof I measure about 60 opened files by each imapd process. If I have 400 imapd processes it means a 'ulimit -f' global system of 60*400=24000. This is wrong, because I currently have a 4096 limit and I never had problems. Maybe do I have to consider only 'Running' processes to compute this treshold? Thank you very much for every hints. Marco ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus