Testing a 17TB filesystem with xfstests on a VM with 4GB RAM, test 017 reliably triggers the OOM killer, which eventually panics the machine after it has killed everything but the process consuming all the memory. The console output I captured from the last kill where the panic occurs: [ 302.040482] Pid: 16666, comm: xfs_db Not tainted 3.2.0-rc3-dgc+ #105 [ 302.041959] Call Trace: [ 302.042547] [<ffffffff810debfd>] ? cpuset_print_task_mems_allowed+0x9d/0xb0 [ 302.044380] [<ffffffff8111afae>] dump_header.isra.8+0x7e/0x1c0 [ 302.045770] [<ffffffff8111b22c>] ? oom_badness+0x13c/0x150 [ 302.047074] [<ffffffff8111bb23>] out_of_memory+0x513/0x550 [ 302.048524] [<ffffffff81120976>] __alloc_pages_nodemask+0x726/0x740 [ 302.049993] [<ffffffff81155183>] alloc_pages_current+0xa3/0x110 [ 302.051384] [<ffffffff8111814f>] __page_cache_alloc+0x8f/0xa0 [ 302.052960] [<ffffffff811185be>] ? find_get_page+0x1e/0x90 [ 302.054267] [<ffffffff8111a2dd>] filemap_fault+0x2bd/0x480 [ 302.055570] [<ffffffff8106ead8>] ? flush_tlb_page+0x48/0xb0 [ 302.056748] [<ffffffff81138a1f>] __do_fault+0x6f/0x4f0 [ 302.057616] [<ffffffff81139cfc>] ? do_wp_page+0x2ac/0x740 [ 302.058609] [<ffffffff8113b567>] handle_pte_fault+0xf7/0x8b0 [ 302.059557] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0 [ 302.060718] [<ffffffff8113c035>] handle_mm_fault+0x155/0x250 [ 302.061679] [<ffffffff81acc902>] do_page_fault+0x142/0x4f0 [ 302.062599] [<ffffffff8107958d>] ? set_next_entity+0xad/0xd0 [ 302.063548] [<ffffffff8103f6d2>] ? __switch_to+0x132/0x310 [ 302.064575] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0 [ 302.065586] [<ffffffff81acc405>] do_async_page_fault+0x35/0x80 [ 302.066570] [<ffffffff81ac97b5>] async_page_fault+0x25/0x30 [ 302.067509] Mem-Info: [ 302.067992] Node 0 DMA per-cpu: [ 302.068652] CPU 0: hi: 0, btch: 1 usd: 0 [ 302.069444] CPU 1: hi: 0, btch: 1 usd: 0 [ 302.070239] CPU 2: hi: 0, btch: 1 usd: 0 [ 302.071034] CPU 3: hi: 0, btch: 1 usd: 0 [ 302.071830] CPU 4: hi: 0, btch: 1 usd: 0 [ 302.072776] CPU 5: hi: 0, btch: 1 usd: 0 [ 302.073577] CPU 6: hi: 0, btch: 1 usd: 0 [ 302.074374] CPU 7: hi: 0, btch: 1 usd: 0 [ 302.075172] Node 0 DMA32 per-cpu: [ 302.075745] CPU 0: hi: 186, btch: 31 usd: 0 [ 302.076712] CPU 1: hi: 186, btch: 31 usd: 0 [ 302.077517] CPU 2: hi: 186, btch: 31 usd: 0 [ 302.078313] CPU 3: hi: 186, btch: 31 usd: 1 [ 302.079104] CPU 4: hi: 186, btch: 31 usd: 0 [ 302.080274] CPU 5: hi: 186, btch: 31 usd: 0 [ 302.081482] CPU 6: hi: 186, btch: 31 usd: 0 [ 302.082689] CPU 7: hi: 186, btch: 31 usd: 36 [ 302.084210] Node 0 Normal per-cpu: [ 302.085104] CPU 0: hi: 186, btch: 31 usd: 1 [ 302.086363] CPU 1: hi: 186, btch: 31 usd: 30 [ 302.087575] CPU 2: hi: 186, btch: 31 usd: 0 [ 302.089193] CPU 3: hi: 186, btch: 31 usd: 16 [ 302.090448] CPU 4: hi: 186, btch: 31 usd: 14 [ 302.091646] CPU 5: hi: 186, btch: 31 usd: 0 [ 302.092992] CPU 6: hi: 186, btch: 31 usd: 30 [ 302.093968] CPU 7: hi: 186, btch: 31 usd: 14 [ 302.094945] active_anon:789505 inactive_anon:197012 isolated_anon:0 [ 302.094946] active_file:11 inactive_file:18 isolated_file:0 [ 302.094947] unevictable:0 dirty:0 writeback:29 unstable:0 [ 302.094948] free:6465 slab_reclaimable:2020 slab_unreclaimable:3473 [ 302.094949] mapped:5 shmem:1 pagetables:2539 bounce:0 [ 302.101211] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0s [ 302.108917] lowmem_reserve[]: 0 3512 4017 4017 [ 302.109885] Node 0 DMA32 free:9020kB min:7076kB low:8844kB high:10612kB active_anon:2962672kB inactive_anon:592684kB active_file:44kB inactive_file:0kB unevictable:s [ 302.117811] lowmem_reserve[]: 0 0 505 505 [ 302.118938] Node 0 Normal free:952kB min:1016kB low:1268kB high:1524kB active_anon:195348kB inactive_anon:195364kB active_file:0kB inactive_file:72kB unevictable:0ks [ 302.126920] lowmem_reserve[]: 0 0 0 0 [ 302.127744] Node 0 DMA: 0*4kB 0*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15888kB [ 302.130415] Node 0 DMA32: 68*4kB 48*8kB 35*16kB 16*32kB 9*64kB 3*128kB 2*256kB 2*512kB 1*1024kB 0*2048kB 1*4096kB = 9344kB [ 302.133101] Node 0 Normal: 117*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 988kB [ 302.135488] 185 total pagecache pages [ 302.136455] 149 pages in swap cache [ 302.137171] Swap cache stats: add 126014, delete 125865, find 94/133 [ 302.138523] Free swap = 0kB [ 302.139114] Total swap = 497976kB [ 302.149921] 1048560 pages RAM [ 302.150591] 36075 pages reserved [ 302.151254] 35 pages shared [ 302.151830] 1004770 pages non-shared [ 302.152922] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 302.154450] [ 939] 0 939 5295 1 4 -17 -1000 udevd [ 302.156160] [ 1002] 0 1002 5294 1 4 -17 -1000 udevd [ 302.157673] [ 1003] 0 1003 5294 0 4 -17 -1000 udevd [ 302.159200] [ 2399] 0 2399 1737 0 7 -17 -1000 dhclient [ 302.161078] [ 2442] 0 2442 12405 0 4 -17 -1000 sshd [ 302.162581] [ 2446] 0 2446 20357 1 0 -17 -1000 sshd [ 302.164408] [ 2450] 1000 2450 20357 0 1 -17 -1000 sshd [ 302.165901] [ 2455] 1000 2455 5592 0 7 -17 -1000 bash [ 302.167401] [ 2516] 0 2516 20357 1 6 -17 -1000 sshd [ 302.169199] [ 2520] 1000 2520 20357 0 4 -17 -1000 sshd [ 302.170702] [ 2527] 1000 2527 5606 1 6 -17 -1000 bash [ 302.172508] [ 5516] 0 5516 5089 0 2 -17 -1000 sudo [ 302.174008] [ 5517] 0 5517 2862 1 0 -17 -1000 check [ 302.175536] [16484] 0 16484 2457 7 0 -17 -1000 017 [ 302.177336] [16665] 0 16665 1036 0 2 -17 -1000 xfs_check [ 302.179001] [16666] 0 16666 10031571 986414 6 -17 -1000 xfs_db [ 302.180890] Kernel panic - not syncing: Out of memory and no killable processes... [ 302.180892] [ 302.182585] Pid: 16666, comm: xfs_db Not tainted 3.2.0-rc3-dgc+ #105 [ 302.183764] Call Trace: [ 302.184528] [<ffffffff81abe166>] panic+0x91/0x19d [ 302.185790] [<ffffffff8111bb38>] out_of_memory+0x528/0x550 [ 302.187244] [<ffffffff81120976>] __alloc_pages_nodemask+0x726/0x740 [ 302.188780] [<ffffffff81155183>] alloc_pages_current+0xa3/0x110 [ 302.189951] [<ffffffff8111814f>] __page_cache_alloc+0x8f/0xa0 [ 302.191039] [<ffffffff811185be>] ? find_get_page+0x1e/0x90 [ 302.192168] [<ffffffff8111a2dd>] filemap_fault+0x2bd/0x480 [ 302.193215] [<ffffffff8106ead8>] ? flush_tlb_page+0x48/0xb0 [ 302.194343] [<ffffffff81138a1f>] __do_fault+0x6f/0x4f0 [ 302.195312] [<ffffffff81139cfc>] ? do_wp_page+0x2ac/0x740 [ 302.196490] [<ffffffff8113b567>] handle_pte_fault+0xf7/0x8b0 [ 302.197554] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0 [ 302.198670] [<ffffffff8113c035>] handle_mm_fault+0x155/0x250 [ 302.199755] [<ffffffff81acc902>] do_page_fault+0x142/0x4f0 [ 302.200921] [<ffffffff8107958d>] ? set_next_entity+0xad/0xd0 [ 302.201987] [<ffffffff8103f6d2>] ? __switch_to+0x132/0x310 [ 302.203023] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0 [ 302.204321] [<ffffffff81acc405>] do_async_page_fault+0x35/0x80 [ 302.205417] [<ffffffff81ac97b5>] async_page_fault+0x25/0x30 It looks to me like the process causing the page fault and trying to allocate more memory (xfs_db) is also the one consuming all the memory and by all metrics is the obvious candidate to kill. So, why does the OOM killer kill everything else but the memory hog and then panic the machine? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>