Thanks Brian, Our system is a Supermicro motherboard-X8 Series with a 4 drive hardware RAID 10. The CPU is an Intel Quad Core i7 X3440 @ 2.53GHz. At the time the system had 4GB RAM and about 3GB swap. We have since upgraded the RAM to 16GB and swap is the same. Current memory usage: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1488 root 20 0 8768m 8.5g 2032 R 2 53.9 181:15.04 glusterfs The process over time but quickly consumes the available memory on the system and then more slowly begins to eat up swap. I've tried forcing the kernel to GC and reclaim cache, however the cache is only about a GB Current output of free -m before a cache reclaim: root at ifx05:~# free -m total used free shared buffers cached Mem: 16079 15865 213 0 374 1224 -/+ buffers/cache: 14267 1812 Swap: 3814 1 3813 Here is the output from OOM killer: ? Feb 8 08:05:36 ifx05 kernel: [679946.164642] glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Feb 8 08:05:36 ifx05 kernel: [679946.270816] glusterfsd cpuset=/ mems_allowed=0 Feb 8 08:05:36 ifx05 kernel: [679946.325069] Pid: 2070, comm: glusterfsd Not tainted 3.2.0-0.bpo.3-amd64 #1 Feb 8 08:05:36 ifx05 kernel: [679946.408416] Call Trace: Feb 8 08:05:36 ifx05 kernel: [679946.438667] [<ffffffff810bf159>] ? dump_header+0x76/0x1a7 Feb 8 08:05:36 ifx05 kernel: [679946.505292] [<ffffffff81173f34>] ? security_real_capable_noaudit+0x34/0x59 Feb 8 08:05:36 ifx05 kernel: [679946.589592] [<ffffffff810bf088>] ? oom_unkillable_task+0x5f/0x92 Feb 8 08:05:36 ifx05 kernel: [679946.663604] [<ffffffff810bf5af>] ? oom_kill_process+0x52/0x28d Feb 8 08:05:36 ifx05 kernel: [679946.735431] [<ffffffff810bfabb>] ? out_of_memory+0x2d1/0x337 Feb 8 08:05:36 ifx05 kernel: [679946.805178] [<ffffffff810c3cbd>] ? __alloc_pages_nodemask+0x5d8/0x731 Feb 8 08:05:48 ifx05 kernel: [679946.884294] [<ffffffff810ef944>] ? alloc_pages_current+0xa7/0xc9 Feb 8 08:05:48 ifx05 kernel: [679946.958201] [<ffffffff810be8e1>] ? filemap_fault+0x26d/0x35c Feb 8 08:05:48 ifx05 kernel: [679947.027962] [<ffffffff810dad70>] ? __do_fault+0xc6/0x438 Feb 8 08:05:48 ifx05 kernel: [679947.093631] [<ffffffff810dbf09>] ? handle_pte_fault+0x352/0x965 Feb 8 08:05:48 ifx05 kernel: [679947.166503] [<ffffffff81120595>] ? getxattr+0xee/0x119 Feb 8 08:05:48 ifx05 kernel: [679947.230005] [<ffffffff81120595>] ? getxattr+0xee/0x119 Feb 8 08:05:48 ifx05 kernel: [679947.293519] [<ffffffff81368e2e>] ? do_page_fault+0x327/0x34c Feb 8 08:05:48 ifx05 kernel: [679947.363275] [<ffffffff81111e9e>] ? user_path_at_empty+0x55/0x7d Feb 8 08:05:48 ifx05 kernel: [679947.436234] [<ffffffff81109949>] ? sys_newlstat+0x24/0x2d Feb 8 08:05:48 ifx05 kernel: [679947.502876] [<ffffffff81117bb9>] ? dput+0x29/0xf2 Feb 8 08:05:50 ifx05 kernel: [679947.561177] [<ffffffff81366235>] ? page_fault+0x25/0x30 Feb 8 08:05:50 ifx05 kernel: [679947.625726] Mem-Info: Feb 8 08:05:50 ifx05 kernel: [679947.653900] Node 0 DMA per-cpu: Feb 8 08:05:50 ifx05 kernel: [679947.692559] CPU 0: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679947.750881] CPU 1: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679947.809187] CPU 2: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679947.867501] CPU 3: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679947.925813] CPU 4: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679947.984128] CPU 5: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.042445] CPU 6: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.100757] CPU 7: hi: 0, btch: 1 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.159068] Node 0 DMA32 per-cpu: Feb 8 08:05:50 ifx05 kernel: [679948.199814] CPU 0: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.258130] CPU 1: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.316447] CPU 2: hi: 186, btch: 31 usd: 30 Feb 8 08:05:50 ifx05 kernel: [679948.374756] CPU 3: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.433069] CPU 4: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.491381] CPU 5: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.549695] CPU 6: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.608011] CPU 7: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.666323] Node 0 Normal per-cpu: Feb 8 08:05:50 ifx05 kernel: [679948.708109] CPU 0: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.766425] CPU 1: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.824739] CPU 2: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.883052] CPU 3: hi: 186, btch: 31 usd: 59 Feb 8 08:05:50 ifx05 kernel: [679948.941365] CPU 4: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679948.999678] CPU 5: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679949.057996] CPU 6: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679949.116306] CPU 7: hi: 186, btch: 31 usd: 0 Feb 8 08:05:50 ifx05 kernel: [679949.174626] active_anon:702601 inactive_anon:260864 isolated_anon:55 Feb 8 08:05:50 ifx05 kernel: [679949.174628] active_file:630 inactive_file:833 isolated_file:86 Feb 8 08:05:50 ifx05 kernel: [679949.174629] unevictable:0 dirty:0 writeback:0 unstable:0 Feb 8 08:05:50 ifx05 kernel: [679949.174630] free:21828 slab_reclaimable:5760 slab_unreclaimable:5863 Feb 8 08:05:50 ifx05 kernel: [679949.174631] mapped:378 shmem:163 pagetables:5053 bounce:0 Feb 8 08:05:50 ifx05 kernel: [679949.533783] Node 0 DMA free:15904kB min:256kB low:320kB high:384kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Feb 8 08:05:50 ifx05 kernel: [679949.974901] lowmem_reserve[]: 0 2991 4001 4001 Feb 8 08:05:50 ifx05 kernel: [679950.029570] Node 0 DMA32 free:54652kB min:50332kB low:62912kB high:75496kB active_anon:2356292kB inactive_anon:589348kB active_file:1656kB inactive_file:2088kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:3063584kB mlocked:0kB dirty:0kB writeback:0kB mapped:1136kB shmem:648kB slab_reclaimable:14680kB slab_unreclaimable:5804kB kernel_stack:776kB pagetables:11180kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Feb 8 08:05:50 ifx05 kernel: [679950.518329] lowmem_reserve[]: 0 0 1010 1010 Feb 8 08:05:50 ifx05 kernel: [679950.569883] Node 0 Normal free:16616kB min:16992kB low:21240kB high:25488kB active_anon:453668kB inactive_anon:454080kB active_file:676kB inactive_file:392kB unevictable:0kB isolated(anon):220kB isolated(file):128kB present:1034240kB mlocked:0kB dirty:0kB writeback:0kB mapped:508kB shmem:4kB slab_reclaimable:8360kB slab_unreclaimable:17648kB kernel_stack:1632kB pagetables:9032kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3 all_unreclaimable? no Feb 8 08:05:50 ifx05 kernel: [679951.055531] lowmem_reserve[]: 0 0 0 0 Feb 8 08:05:50 ifx05 kernel: [679951.100836] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB Feb 8 08:05:50 ifx05 kernel: [679951.230148] Node 0 DMA32: 12135*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 54684kB Feb 8 08:05:50 ifx05 kernel: [679951.365691] Node 0 Normal: 843*4kB 502*8kB 173*16kB 55*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 16716kB Feb 8 08:05:50 ifx05 kernel: [679951.505398] 94663 total pagecache pages Feb 8 08:05:50 ifx05 kernel: [679951.552277] 92078 pages in swap cache Feb 8 08:05:50 ifx05 kernel: [679951.597096] Swap cache stats: add 6081494, delete 5990029, find 1251231/1858053 Feb 8 08:05:50 ifx05 kernel: [679951.685541] Free swap = 0kB Feb 8 08:05:50 ifx05 kernel: [679951.720980] Total swap = 3906556kB Feb 8 08:05:50 ifx05 kernel: [679951.775776] 1048560 pages RAM Feb 8 08:05:50 ifx05 kernel: [679951.812253] 35003 pages reserved Feb 8 08:05:50 ifx05 kernel: [679951.851847] 10245 pages shared Feb 8 08:05:50 ifx05 kernel: [679951.889380] 982341 pages non-shared Feb 8 08:05:50 ifx05 kernel: [679951.932102] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name Feb 8 08:05:50 ifx05 kernel: [679952.021615] [ 329] 0 329 4263 1 0 -17 -1000 udevd Feb 8 08:05:50 ifx05 kernel: [679952.021619] [ 826] 1 826 2036 21 1 0 0 portmap Feb 8 08:05:50 ifx05 kernel: [679952.021622] [ 906] 102 906 3608 2 1 0 0 rpc.statd Feb 8 08:05:50 ifx05 kernel: [679952.021625] [ 1051] 0 1051 6768 0 7 0 0 rpc.idmapd Feb 8 08:05:50 ifx05 kernel: [679952.021628] [ 1212] 0 1212 992 1 2 0 0 acpid Feb 8 08:05:50 ifx05 kernel: [679952.021631] [ 1221] 0 1221 4691 1 2 0 0 atd Feb 8 08:05:50 ifx05 kernel: [679952.021634] [ 1253] 0 1253 5619 20 3 0 0 cron Feb 8 08:05:50 ifx05 kernel: [679952.021637] [ 1438] 0 1438 12307 29 3 -17 -1000 sshd Feb 8 08:05:50 ifx05 kernel: [679952.021640] [ 1450] 0 1450 9304 29 3 0 0 master Feb 8 08:05:50 ifx05 kernel: [679952.021642] [ 1457] 107 1457 9860 37 2 0 0 qmgr Feb 8 08:05:50 ifx05 kernel: [679952.021645] [ 1491] 0 1491 54480 636 3 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021648] [ 1495] 0 1495 70879 1081 6 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021651] [ 1499] 0 1499 37250 25 4 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021653] [ 1503] 0 1503 70880 934 4 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021656] [ 1507] 0 1507 111360 1936 5 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021659] [ 1511] 0 1511 87515 1429 4 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021661] [ 1515] 0 1515 70617 489 0 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021664] [ 1519] 0 1519 70901 956 7 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021667] [ 1523] 0 1523 88380 2041 2 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021670] [ 1527] 0 1527 54818 1421 7 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021673] [ 1531] 0 1531 37250 27 4 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021675] [ 1535] 0 1535 54224 315 1 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021678] [ 1539] 0 1539 37250 0 4 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021681] [ 1543] 0 1543 72001 1561 2 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021684] [ 1547] 0 1547 87954 2595 5 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021687] [ 1551] 0 1551 37209 11 4 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021689] [ 1559] 0 1559 71410 1301 5 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021692] [ 1563] 0 1563 121841 3270 2 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021694] [ 1567] 0 1567 70925 1217 5 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021697] [ 1571] 0 1571 54241 1169 2 0 0 glusterfsd Feb 8 08:05:50 ifx05 kernel: [679952.021699] [ 1635] 105 1635 9597 32 3 0 0 ntpd Feb 8 08:05:50 ifx05 kernel: [679952.021702] [ 1670] 0 1670 1495 1 0 0 0 getty Feb 8 08:05:50 ifx05 kernel: [679952.021705] [ 2659] 107 2659 10454 47 2 0 0 tlsmgr Feb 8 08:05:50 ifx05 kernel: [679952.021708] [ 6691] 0 6691 47094 44 1 0 0 glusterd Feb 8 08:05:50 ifx05 kernel: [679952.021711] [ 7449] 0 7449 10818 5 1 0 0 syslog-ng Feb 8 08:05:50 ifx05 kernel: [679952.021713] [ 7450] 0 7450 12452 157 1 0 0 syslog-ng Feb 8 08:05:50 ifx05 kernel: [679952.021716] [ 9475] 108 9475 12591 8 5 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021719] [ 9476] 108 9476 12591 241 7 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021722] [ 9477] 108 9477 12591 24 3 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021725] [ 9478] 108 9478 12591 24 4 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021728] [ 9479] 108 9479 12591 24 0 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021731] [ 9480] 108 9480 12591 24 4 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021734] [ 9481] 108 9481 12591 24 1 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021737] [ 9482] 108 9482 12591 112 4 0 0 zabbix_agentd Feb 8 08:05:50 ifx05 kernel: [679952.021740] [ 9600] 106 9600 11810 300 0 0 0 snmpd Feb 8 08:05:50 ifx05 kernel: [679952.021743] [10556] 0 10556 57500 82 4 0 0 glusterfs Feb 8 08:05:50 ifx05 kernel: [679952.021746] [31815] 0 31815 27485 245 4 0 0 ruby Feb 8 08:05:50 ifx05 kernel: [679952.021749] [ 2842] 0 2842 4262 1 3 -17 -1000 udevd Feb 8 08:05:50 ifx05 kernel: [679952.021754] [30513] 0 30513 1724628 814692 5 0 0 glusterfs Feb 8 08:05:50 ifx05 kernel: [679952.021757] [19809] 107 19809 9820 89 2 0 0 pickup Feb 8 08:05:50 ifx05 kernel: [679952.021759] [20590] 0 20590 8214 61 2 0 0 cron Feb 8 08:05:50 ifx05 kernel: [679952.021762] [20591] 0 20591 1001 24 2 0 0 sh Feb 8 08:05:50 ifx05 kernel: [679952.021765] [20592] 0 20592 40966 18786 3 0 0 puppet Feb 8 08:05:50 ifx05 kernel: [679952.021767] [20861] 0 20861 41134 19137 5 0 0 puppet Feb 8 08:05:50 ifx05 kernel: [679952.021770] Out of memory: Kill process 30513 (glusterfs) score 816 or sacrifice child Feb 8 08:05:50 ifx05 kernel: [679952.021772] Killed process 30513 (glusterfs) total-vm:6898512kB, anon-rss:3257908kB, file-rss:860kB On 2/9/13 2:26 PM, Brian Foster wrote: > On 02/08/2013 05:14 PM, Steven King wrote: >> Hello, >> >> I am running GlusterFS version 3.2.7-2~bpo60+1 on Debian 6.0.6. Today, I >> have experienced a a glusterfs process cause the server to invoke >> oom_killer. >> >> How exactly would I go about investigating this and coming up with a fix? >> > The OOM killer output to syslog and details on your hardware might be > useful to include. > > Following that, you could monitor the address space (VIRT) and set size > (RES/RSS) of the relevant processes with top on your server. For > example, is there a sudden increase in set size or does it constantly, > gradually increase? > > Brian -- Steve King Network/Linux Engineer - AdSafe Media Cisco Certified Network Professional CompTIA Linux+ Certified Professional CompTIA A+ Certified Professional