The Xen Project CI system (`osstest') reports that the current tip of the Linux 3.18 and Linux 4.1 stable branches have serious problems booting Xen. Our automatic bisector has completed its work and fingered (in 3.18) a2d8c5147532 "mm/swap.c: flush lru pvecs on compound page arrival" (and a backport of what seems to be the same commit to Linux 4.1). The failure as accompanied by lots of oom killer activity. For the full logs for a repro, see http://logs.test-lab.xenproject.org/osstest/logs/97435/test-amd64-amd64-xl-credit2/info.html See particularly `serial-chardonnay0.log', near `Jul 16 14:16:32'. More details below, including: - serial log extract from an earlier test failure (for our reference, flight 97278 job test-amd64-amd64-xl) - report from the osstest bisector Thanks for your attention. Ian. Jul 14 00:27:33.715871 [ 22.813518] rc.local invoked oom-killer: gfp_mask=0x84d0, order=0, oom_score_adj=0 Jul 14 00:27:33.899843 [ 22.813541] rc.local cpuset=/ mems_allowed=0 Jul 14 00:27:33.907821 [ 22.813550] CPU: 0 PID: 2676 Comm: rc.local Not tainted 3.18.37 #1 Jul 14 00:27:33.907861 [ 22.813556] Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS S1200BT.86B.02.00.0042.050820141549 05/08/2014 Jul 14 00:27:33.923914 [ 22.813565] 0000000000000000 ffff8800024b7968 ffffffff817dcab5 00000000000084d0 Jul 14 00:27:33.931995 [ 22.813573] 0000000000000000 ffff8800024b79c8 ffffffff8118082f ffff8800024b7988 Jul 14 00:27:33.940021 [ 22.813582] ffffffff8114c315 ffffffff817e5ee9 0000000000000001 ffff8800024b79c8 Jul 14 00:27:33.948034 [ 22.813590] Call Trace: Jul 14 00:27:33.948065 [ 22.813600] [<ffffffff817dcab5>] dump_stack+0x7c/0x98 Jul 14 00:27:33.955838 [ 22.813609] [<ffffffff8118082f>] dump_header.isra.11+0x8f/0x1e0 Jul 14 00:27:33.963831 [ 22.813616] [<ffffffff8114c315>] ? __delayacct_freepages_end+0x45/0x50 Jul 14 00:27:33.963872 [ 22.813677] [<ffffffff817e5ee9>] ? _raw_spin_unlock_irqrestore+0x29/0x90 Jul 14 00:27:33.971868 [ 22.813685] [<ffffffff8131f93e>] ? ___ratelimit+0xae/0x160 Jul 14 00:27:33.979838 [ 22.813691] [<ffffffff81180ddc>] oom_kill_process+0x20c/0x370 Jul 14 00:27:33.987847 [ 22.813699] [<ffffffff810bbc69>] ? has_capability_noaudit+0x19/0x20 Jul 14 00:27:33.987888 [ 22.813706] [<ffffffff811812d1>] out_of_memory+0x211/0x330 Jul 14 00:27:33.995847 [ 22.813714] [<ffffffff81186331>] __alloc_pages_nodemask+0xb11/0xb50 Jul 14 00:27:34.003844 [ 22.813721] [<ffffffff81186382>] __get_free_pages+0x12/0x70 Jul 14 00:27:34.011845 [ 22.813728] [<ffffffff811863f1>] get_zeroed_page+0x11/0x20 Jul 14 00:27:34.011882 [ 22.813735] [<ffffffff811a9c31>] __pud_alloc+0x21/0x120 Jul 14 00:27:34.019841 [ 22.813742] [<ffffffff811ab312>] handle_mm_fault+0x352/0xd50 Jul 14 00:27:34.027863 [ 22.813749] [<ffffffff811a550f>] ? follow_page_mask+0x2f/0x4f0 Jul 14 00:27:34.035848 [ 22.813756] [<ffffffff811af392>] ? find_vma+0x62/0x70 Jul 14 00:27:34.035885 [ 22.813763] [<ffffffff811a5b48>] __get_user_pages+0x178/0x660 Jul 14 00:27:34.043839 [ 22.813770] [<ffffffff811a607d>] get_user_pages+0x4d/0x50 Jul 14 00:27:34.051847 [ 22.813776] [<ffffffff811db546>] copy_strings.isra.26+0x176/0x310 Jul 14 00:27:34.051886 [ 22.813782] [<ffffffff811db71a>] copy_strings_kernel+0x3a/0x60 Jul 14 00:27:34.059844 [ 22.813789] [<ffffffff811dcad8>] do_execve_common.isra.33+0x418/0x680 Jul 14 00:27:34.067846 [ 22.813795] [<ffffffff811dd014>] SyS_execve+0x24/0x30 Jul 14 00:27:34.075839 [ 22.813801] [<ffffffff817e6b19>] stub_execve+0x69/0xa0 Jul 14 00:27:34.075876 [ 22.813807] Mem-Info: Jul 14 00:27:34.083841 [ 22.813810] DMA per-cpu: Jul 14 00:27:34.083872 [ 22.813813] CPU 0: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.083905 [ 22.813817] CPU 1: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.091848 [ 22.813821] CPU 2: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.099849 [ 22.813825] CPU 3: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.099885 [ 22.813829] DMA32 per-cpu: Jul 14 00:27:34.107837 [ 22.813833] CPU 0: hi: 186, btch: 31 usd: 88 Jul 14 00:27:34.107872 [ 22.813837] CPU 1: hi: 186, btch: 31 usd: 72 Jul 14 00:27:34.115855 [ 22.813841] CPU 2: hi: 186, btch: 31 usd: 47 Jul 14 00:27:34.115892 [ 22.813845] CPU 3: hi: 186, btch: 31 usd: 171 Jul 14 00:27:34.123844 [ 22.813852] active_anon:0 inactive_anon:0 isolated_anon:0 Jul 14 00:27:34.131837 [ 22.813852] active_file:0 inactive_file:0 isolated_file:0 Jul 14 00:27:34.131873 [ 22.813852] unevictable:69 dirty:9 writeback:0 unstable:0 Jul 14 00:27:34.139847 [ 22.813852] free:1036 slab_reclaimable:1887 slab_unreclaimable:4091 Jul 14 00:27:34.147839 [ 22.813852] mapped:4099 shmem:154 pagetables:706 bounce:0 Jul 14 00:27:34.155836 [ 22.813852] free_cma:0 Jul 14 00:27:34.155867 [ 22.813875] DMA free:1680kB min:96kB low:120kB high:144kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:1048kB dirty:0kB writeback:0kB mapped:408kB shmem:8kB slab_reclaimable:92kB slab_unreclaimable:328kB kernel_stack:64kB pagetables:148kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jul 14 00:27:34.195860 [ 22.813896] lowmem_reserve[]: 0 396 396 396 Jul 14 00:27:34.195898 [ 22.813908] DMA32 free:2464kB min:2496kB low:3120kB high:3744kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:272kB isolated(anon):0kB isolated(file):0kB present:508308kB managed:408952kB mlocked:32180kB dirty:36kB writeback:0kB mapped:15988kB shmem:608kB slab_reclaimable:7456kB slab_unreclaimable:16036kB kernel_stack:2720kB pagetables:2676kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jul 14 00:27:34.243847 [ 22.813929] lowmem_reserve[]: 0 0 0 0 Jul 14 00:27:34.243882 [ 22.813936] DMA: 0*4kB 0*8kB 1*16kB (R) 0*32kB 0*64kB 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1680kB Jul 14 00:27:34.251860 [ 22.813956] DMA32: 0*4kB 0*8kB 1*16kB (R) 1*32kB (R) 0*64kB 1*128kB (R) 1*256kB (R) 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB = 2480kB Jul 14 00:27:34.267804 [ 22.813976] 15037 total pagecache pages Jul 14 00:27:34.267825 [ 22.813980] 0 pages in swap cache Jul 14 00:27:34.275851 [ 22.813984] Swap cache stats: add 0, delete 0, find 0/0 Jul 14 00:27:34.283845 [ 22.813988] Free swap = 1949692kB Jul 14 00:27:34.283879 [ 22.813991] Total swap = 1949692kB Jul 14 00:27:34.283908 [ 22.813994] 131071 pages RAM Jul 14 00:27:34.291912 [ 22.813997] 0 pages HighMem/MovableOnly Jul 14 00:27:34.291946 [ 22.814000] 24860 pages reserved Jul 14 00:27:34.299797 [ 22.814003] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Jul 14 00:27:34.307798 [ 22.814013] [ 1316] 0 1316 9825 819 21 0 -1000 udevd Jul 14 00:27:34.315792 [ 22.814021] [ 2038] 0 2038 1084 25 7 0 0 ntpdate Jul 14 00:27:34.323800 [ 22.814027] [ 2040] 0 2040 1569 172 9 0 0 lockfile-create Jul 14 00:27:34.331829 [ 22.814034] [ 2047] 0 2047 1060 331 7 0 0 startpar Jul 14 00:27:34.339797 [ 22.814040] [ 2061] 0 2061 9270 668 24 0 0 rpcbind Jul 14 00:27:34.347796 [ 22.814047] [ 2076] 106 2076 9320 700 22 0 0 rpc.statd Jul 14 00:27:34.355808 [ 22.814054] [ 2088] 0 2088 6892 58 18 0 0 rpc.idmapd Jul 14 00:27:34.363800 [ 22.814060] [ 2131] 0 2131 1084 410 7 0 0 rc Jul 14 00:27:34.371807 [ 22.814066] [ 2137] 0 2137 3202 648 13 0 0 startpar Jul 14 00:27:34.379805 [ 22.814073] [ 2160] 0 2160 65721 788 33 0 0 rsyslogd Jul 14 00:27:34.387805 [ 22.814079] [ 2175] 0 2175 1064 394 8 0 0 acpid Jul 14 00:27:34.395808 [ 22.814085] [ 2211] 0 2211 19431 1167 41 0 0 apache2 Jul 14 00:27:34.403799 [ 22.814091] [ 2214] 33 2214 91722 1032 69 0 0 apache2 Jul 14 00:27:34.411802 [ 22.814098] [ 2216] 33 2216 91722 1031 69 0 0 apache2 Jul 14 00:27:34.419800 [ 22.814104] [ 2422] 0 2422 4756 42 13 0 0 atd Jul 14 00:27:34.427806 [ 22.814110] [ 2423] 0 2423 6869 590 18 0 0 cron Jul 14 00:27:34.435804 [ 22.814117] [ 2606] 0 2606 8780 883 21 0 0 oxenstored Jul 14 00:27:34.443846 [ 22.814123] [ 2619] 105 2619 10560 551 24 0 0 dbus-daemon Jul 14 00:27:34.451802 [ 22.814130] [ 2624] 0 2624 13796 700 28 0 -1000 sshd Jul 14 00:27:34.459799 [ 22.814136] [ 2631] 0 2631 22682 57 19 0 0 xenconsoled Jul 14 00:27:34.467815 [ 22.814143] [ 2643] 104 2643 13313 828 26 0 0 exim4 Jul 14 00:27:34.475801 [ 22.814149] [ 2652] 0 2652 60164 1974 78 0 0 qemu-system-i38 Jul 14 00:27:34.483868 [ 22.814156] [ 2656] 0 2656 20648 1510 42 0 0 sshd Jul 14 00:27:34.491859 [ 22.814162] [ 2660] 107 2660 8345 1093 21 0 0 ntpd Jul 14 00:27:34.499864 [ 22.814169] [ 2673] 0 2673 1084 411 7 0 0 rc.local Jul 14 00:27:34.507852 [ 22.814175] [ 2676] 0 2676 1084 44 6 0 0 rc.local Jul 14 00:27:34.515859 [ 22.814181] [ 2677] 1000 2677 20648 946 41 0 0 sshd Jul 14 00:27:34.523849 [ 22.814188] Out of memory: Kill process 2652 (qemu-system-i38) score 3 or sacrifice child Jul 14 00:27:34.531862 [ 22.814194] Killed process 2652 (qemu-system-i38) total-vm:240656kB, anon-rss:2440kB, file-rss:5456kB Jul 14 00:27:34.539871 [ 22.822755] sshd invoked oom-killer: gfp_mask=0x2000d0, order=0, oom_score_adj=0 Jul 14 00:27:34.547800 [ 22.822760] sshd cpuset=/ mems_allowed=0 Jul 14 00:27:34.555868 [ 22.822764] CPU: 1 PID: 2677 Comm: sshd Not tainted 3.18.37 #1 Jul 14 00:27:34.563847 [ 22.822767] Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS S1200BT.86B.02.00.0042.050820141549 05/08/2014 Jul 14 00:27:34.571852 [ 22.822771] 0000000000000000 ffff8800024fb990 ffffffff817dcab5 00000000002000d0 Jul 14 00:27:34.579860 [ 22.822775] 0000000000000000 ffff8800024fb9f0 ffffffff8118082f ffff8800024fb9b0 Jul 14 00:27:34.587803 [ 22.822779] ffffffff8114c315 ffffffff817e5ee9 0000000000000001 ffff8800024fb9f0 Jul 14 00:27:34.595862 [ 22.822783] Call Trace: Jul 14 00:27:34.595892 [ 22.822787] [<ffffffff817dcab5>] dump_stack+0x7c/0x98 Jul 14 00:27:34.603859 [ 22.822791] [<ffffffff8118082f>] dump_header.isra.11+0x8f/0x1e0 Jul 14 00:27:34.611843 [ 22.822794] [<ffffffff8114c315>] ? __delayacct_freepages_end+0x45/0x50 Jul 14 00:27:34.619891 [ 22.822798] [<ffffffff817e5ee9>] ? _raw_spin_unlock_irqrestore+0x29/0x90 Jul 14 00:27:34.619933 [ 22.822801] [<ffffffff8131f93e>] ? ___ratelimit+0xae/0x160 Jul 14 00:27:34.627853 [ 22.822804] [<ffffffff81180ddc>] oom_kill_process+0x20c/0x370 Jul 14 00:27:34.635900 [ 22.822808] [<ffffffff810bbc69>] ? has_capability_noaudit+0x19/0x20 Jul 14 00:27:34.643851 [ 22.822812] [<ffffffff811812d1>] out_of_memory+0x211/0x330 Jul 14 00:27:34.643888 [ 22.822815] [<ffffffff81186331>] __alloc_pages_nodemask+0xb11/0xb50 Jul 14 00:27:34.651845 [ 22.822820] [<ffffffff811cb40c>] new_slab+0x2bc/0x320 Jul 14 00:27:34.660009 [ 22.822823] [<ffffffff811cc770>] __slab_alloc.constprop.69+0x440/0x570 Jul 14 00:27:34.667913 [ 22.822828] [<ffffffff81054f3e>] ? xen_leave_lazy_mmu+0xe/0x10 Jul 14 00:27:34.667952 [ 22.822831] [<ffffffff811b6721>] ? anon_vma_fork+0x71/0x140 Jul 14 00:27:34.675855 [ 22.822835] [<ffffffff811b651b>] ? anon_vma_clone+0x6b/0x200 Jul 14 00:27:34.683863 [ 22.822838] [<ffffffff811b6721>] ? anon_vma_fork+0x71/0x140 Jul 14 00:27:34.691849 [ 22.822841] [<ffffffff811cdad6>] kmem_cache_alloc+0x116/0x150 Jul 14 00:27:34.691887 [ 22.822844] [<ffffffff811b6721>] anon_vma_fork+0x71/0x140 Jul 14 00:27:34.699848 [ 22.822848] [<ffffffff810b2017>] copy_process.part.45+0x1687/0x1a50 Jul 14 00:27:34.707853 [ 22.822852] [<ffffffff810b2577>] do_fork+0xb7/0x3b0 Jul 14 00:27:34.707888 [ 22.822856] [<ffffffff811f3a34>] ? get_unused_fd_flags+0x34/0x40 Jul 14 00:27:34.715857 [ 22.822860] [<ffffffff811f3a88>] ? __fd_install+0x48/0x60 Jul 14 00:27:34.723872 [ 22.822862] [<ffffffff810b28f1>] SyS_clone+0x11/0x20 Jul 14 00:27:34.723910 [ 22.822866] [<ffffffff817e68c9>] stub_clone+0x69/0x90 Jul 14 00:27:34.731848 [ 22.822868] [<ffffffff817e6589>] ? system_call_fastpath+0x12/0x17 Jul 14 00:27:34.739823 [ 22.822871] Mem-Info: Jul 14 00:27:34.739841 [ 22.822872] DMA per-cpu: Jul 14 00:27:34.747843 [ 22.822874] CPU 0: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.747879 [ 22.822876] CPU 1: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.755836 [ 22.822878] CPU 2: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.755870 [ 22.822880] CPU 3: hi: 0, btch: 1 usd: 0 Jul 14 00:27:34.763850 [ 22.822882] DMA32 per-cpu: Jul 14 00:27:34.763881 [ 22.822884] CPU 0: hi: 186, btch: 31 usd: 88 Jul 14 00:27:34.771838 [ 22.822886] CPU 1: hi: 186, btch: 31 usd: 157 Jul 14 00:27:34.779837 [ 22.822888] CPU 2: hi: 186, btch: 31 usd: 47 Jul 14 00:27:34.779872 [ 22.822890] CPU 3: hi: 186, btch: 31 usd: 173 Jul 14 00:27:34.787884 [ 22.822893] active_anon:0 inactive_anon:0 isolated_anon:0 Jul 14 00:27:34.787920 [ 22.822893] active_file:0 inactive_file:0 isolated_file:0 Jul 14 00:27:34.795844 [ 22.822893] unevictable:69 dirty:9 writeback:0 unstable:0 Jul 14 00:27:34.803834 [ 22.822893] free:1036 slab_reclaimable:1887 slab_unreclaimable:4091 Jul 14 00:27:34.811846 [ 22.822893] mapped:3101 shmem:154 pagetables:650 bounce:0 Jul 14 00:27:34.811884 [ 22.822893] free_cma:0 Jul 14 00:27:34.819842 [ 22.822904] DMA free:1680kB min:96kB low:120kB high:144kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:1048kB dirty:0kB writeback:0kB mapped:336kB shmem:8kB slab_reclaimable:92kB slab_unreclaimable:328kB kernel_stack:64kB pagetables:148kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jul 14 00:27:34.859831 [ 22.822914] lowmem_reserve[]: 0 396 396 396 Jul 14 00:27:34.859866 [ 22.822920] DMA32 free:2464kB min:2496kB low:3120kB high:3744kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:272kB isolated(anon):0kB isolated(file):0kB present:508308kB managed:408952kB mlocked:32180kB dirty:36kB writeback:0kB mapped:12068kB shmem:608kB slab_reclaimable:7456kB slab_unreclaimable:16036kB kernel_stack:2720kB pagetables:2452kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jul 14 00:27:34.899849 [ 22.822930] lowmem_reserve[]: 0 0 0 0 Jul 14 00:27:34.907838 [ 22.822934] DMA: 0*4kB 0*8kB 1*16kB (R) 0*32kB 0*64kB 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1680kB Jul 14 00:27:34.915876 [ 22.822943] DMA32: 0*4kB 0*8kB 1*16kB (R) 1*32kB (R) 0*64kB 1*128kB (R) 1*256kB (R) 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB = 2480kB Jul 14 00:27:34.931842 [ 22.822953] 15037 total pagecache pages Jul 14 00:27:34.931878 [ 22.822955] 0 pages in swap cache Jul 14 00:27:34.939850 [ 22.822957] Swap cache stats: add 0, delete 0, find 0/0 Jul 14 00:27:34.939888 [ 22.822959] Free swap = 1949692kB Jul 14 00:27:34.947842 [ 22.822960] Total swap = 1949692kB Jul 14 00:27:34.947874 [ 22.822962] 131071 pages RAM Jul 14 00:27:34.947903 [ 22.822963] 0 pages HighMem/MovableOnly Jul 14 00:27:34.955868 [ 22.822965] 24860 pages reserved Jul 14 00:27:34.955900 [ 22.822966] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Jul 14 00:27:34.963860 [ 22.822971] [ 1316] 0 1316 9825 819 21 0 -1000 udevd Jul 14 00:27:34.971874 [ 22.822976] [ 2038] 0 2038 1084 25 7 0 0 ntpdate Jul 14 00:27:34.979864 [ 22.822978] [ 2040] 0 2040 1569 172 9 0 0 lockfile-create Jul 14 00:27:34.987862 [ 22.822981] [ 2047] 0 2047 1060 331 7 0 0 startpar Jul 14 00:27:34.995869 [ 22.822984] [ 2061] 0 2061 9270 668 24 0 0 rpcbind Jul 14 00:27:35.003863 [ 22.822987] [ 2076] 106 2076 9320 700 22 0 0 rpc.statd Jul 14 00:27:35.019848 [ 22.822991] [ 2088] 0 2088 6892 58 18 0 0 rpc.idmapd Jul 14 00:27:35.027853 [ 22.822994] [ 2131] 0 2131 1084 410 7 0 0 rc Jul 14 00:27:35.035843 [ 22.822997] [ 2137] 0 2137 3202 648 13 0 0 startpar Jul 14 00:27:35.043857 [ 22.823000] [ 2160] 0 2160 65721 788 33 0 0 rsyslogd Jul 14 00:27:35.051846 [ 22.823003] [ 2175] 0 2175 1064 394 8 0 0 acpid Jul 14 00:27:35.059852 [ 22.823006] [ 2211] 0 2211 19431 1167 41 0 0 apache2 Jul 14 00:27:35.067848 [ 22.823009] [ 2214] 33 2214 91722 1032 69 0 0 apache2 Jul 14 00:27:35.075852 [ 22.823012] [ 2216] 33 2216 91722 1031 69 0 0 apache2 Jul 14 00:27:35.083847 [ 22.823015] [ 2422] 0 2422 4756 42 13 0 0 atd Jul 14 00:27:35.091857 [ 22.823018] [ 2423] 0 2423 6869 590 18 0 0 cron Jul 14 00:27:35.099857 [ 22.823021] [ 2606] 0 2606 8780 883 21 0 0 oxenstored Jul 14 00:27:35.107854 [ 22.823024] [ 2619] 105 2619 10560 551 24 0 0 dbus-daemon Jul 14 00:27:35.115866 [ 22.823027] [ 2624] 0 2624 13796 700 28 0 -1000 sshd Jul 14 00:27:35.123797 [ 22.823030] [ 2631] 0 2631 22682 57 19 0 0 xenconsoled Jul 14 00:27:35.131849 [ 22.823034] [ 2643] 104 2643 13313 828 26 0 0 exim4 Jul 14 00:27:35.139844 [ 22.823037] [ 2656] 0 2656 20648 1510 42 0 0 sshd Jul 14 00:27:35.147846 [ 22.823040] [ 2660] 107 2660 8345 1093 21 0 0 ntpd Jul 14 00:27:35.155856 [ 22.823043] [ 2673] 0 2673 1084 411 7 0 0 rc.local Jul 14 00:27:35.163847 [ 22.823046] [ 2676] 0 2676 1084 44 6 0 0 rc.local Jul 14 00:27:35.171852 [ 22.823049] [ 2677] 1000 2677 20648 946 41 0 0 sshd Jul 14 00:27:35.179883 [ 22.823052] Out of memory: Kill process 2656 (sshd) score 2 or sacrifice child Jul 14 00:27:35.187860 [ 22.823055] Killed process 2677 (sshd) total-vm:82592kB, anon-rss:848kB, file-rss:2936kB Jul 14 00:27:35.195838 [ 22.825444] oxenstored invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 Jul 14 00:27:35.203861 [ 22.825449] oxenstored cpuset=/ mems_allowed=0 Jul 14 00:27:35.203897 [ 22.825453] CPU: 2 PID: 2606 Comm: oxenstored Not tainted 3.18.37 #1 Jul 14 00:27:35.211845 [ 22.825457] Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS S1200BT.86B.02.00.0042.050820141549 05/08/2014 Jul 14 00:27:35.227847 [ 22.825461] 0000000000000000 ffff880008dc3ad8 ffffffff817dcab5 00000000000280da Jul 14 00:27:35.235847 [ 22.825466] 0000000000000000 ffff880008dc3b38 ffffffff8118082f ffff880008dc3af8 Jul 14 00:27:35.243891 [ 22.825469] ffffffff8114c315 ffffffff817e5ee9 0000000000000001 ffff880008dc3b38 Jul 14 00:27:35.243937 [ 22.825473] Call Trace: Jul 14 00:27:35.251845 [ 22.825477] [<ffffffff817dcab5>] dump_stack+0x7c/0x98 Jul 14 00:27:35.251881 [ 22.825481] [<ffffffff8118082f>] dump_header.isra.11+0x8f/0x1e0 Jul 14 00:27:35.259846 [ 22.825484] [<ffffffff8114c315>] ? __delayacct_freepages_end+0x45/0x50 Jul 14 00:27:35.267864 [ 22.825488] [<ffffffff817e5ee9>] ? _raw_spin_unlock_irqrestore+0x29/0x90 Jul 14 00:27:35.275843 [ 22.825492] [<ffffffff8131f93e>] ? ___ratelimit+0xae/0x160 Jul 14 00:27:35.283904 [ 22.825494] [<ffffffff81180ddc>] oom_kill_process+0x20c/0x370 Jul 14 00:27:35.283942 [ 22.825498] [<ffffffff810bbc69>] ? has_capability_noaudit+0x19/0x20 Jul 14 00:27:35.291845 [ 22.825501] [<ffffffff811812d1>] out_of_memory+0x211/0x330 Jul 14 00:27:35.299881 [ 22.825505] [<ffffffff81186331>] __alloc_pages_nodemask+0xb11/0xb50 Jul 14 00:27:35.307834 [ 22.825509] [<ffffffff811ab988>] handle_mm_fault+0x9c8/0xd50 Jul 14 00:27:35.307871 [ 22.825513] [<ffffffff811ea170>] ? poll_select_copy_remaining+0x130/0x130 Jul 14 00:27:35.315942 [ 22.825517] [<ffffffff810a68dd>] __do_page_fault+0x1bd/0x590 Jul 14 00:27:35.323846 [ 22.825521] [<ffffffff[ 23.782240] getty invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 Jul 14 00:27:35.331856 [ 23.782247] getty cpuset=/ mems_allowed=0 Jul 14 00:27:35.339842 [ 23.782252] CPU: 3 PID: 2685 Comm: getty Not tainted 3.18.37 #1 Jul 14 00:27:35.339884 [ 23.782255] Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS S1200BT.86B.02.00.0042.050820141549 05/08/2014 Jul 14 00:27:35.355849 [ 23.782259] 0000000000000000 ffff8800158db7f8 ffffffff817dcab5 00000000000280da Jul 14 00:27:35.363925 [ 23.782263] 0000000000000000 ffff8800158db858 ffffffff8118082f ffff8800158db818 Jul 14 00:27:35.371864 [ 23.782268] ffffffff8114c315 ffffffff817e5ee9 0000000000000001 ffff8800158db858 Jul 14 00:27:35.379855 [ 23.782272] Call Trace: Jul 14 00:27:35.379886 [ 23.782278] [<ffffffff817dcab5>] dump_stack+0x7c/0x98 Jul 14 00:27:35.387850 [ 23.782283] [<ffffffff8118082f>] dump_header.isra.11+0x8f/0x1e0 Jul 14 00:27:35.387888 [ 23.782287] [<ffffffff8114c315>] ? __delayacct_freepages_end+0x45/0x50 Jul 14 00:27:35.395877 [ 23.782291] [<ffffffff817e5ee9>] ? _raw_spin_unlock_irqrestore+0x29/0x90 Jul 14 00:27:35.403858 [ 23.782295] [<ffffffff8131f93e>] ? ___ratelimit+0xae/0x160 Jul 14 00:27:35.411839 [ 23.782298] [<ffffffff81180ddc>] oom_kill_process+0x20c/0x370 Jul 14 00:27:35.419845 [ 23.782301] [<ffffffff811812d1>] out_of_memory+0x211/0x330 Jul 14 00:27:35.419883 [ 23.782305] [<ffffffff81186331>] __alloc_pages_nodemask+0xb11/0xb50 Jul 14 00:27:35.427868 [ 23.782309] [<ffffffff811ab988>] handle_mm_fault+0x9c8/0xd50 Jul 14 00:27:35.435856 [ 23.782320] [<ffffffffa009d912>] ? search_dir+0xc2/0x120 [ext4] Jul 14 00:27:35.443840 [ 23.782324] [<ffffffff810a68dd>] __do_page_fault+0x1bd/0x590 Jul 14 00:27:35.443877 [ 23.782329] [<ffffffff81233736>] ? dquot_file_open+0x16/0x50 Jul 14 00:27:35.451860 [ 23.782334] [<ffffffffa008c953>] ? ext4_file_open+0x43/0x180 [ext4] Jul 14 00:27:35.459849 [ 23.782339] [<ffffffffa008c910>] ? ext4_check_all_de+0xb0/0xb0 [ext4] Jul 14 00:27:35.467884 [ 23.782343] [<ffffffff810a6cdc>] do_page_fault+0x2c/0x40 Jul 14 00:27:35.467920 [ 23.782346] [<ffffffff817e81b8>] page_fault+0x28/0x30 Jul 14 00:27:35.475841 [ 23.782350] [<ffffffff811a47a8>] ? copy_page_to_iter+0x68/0x3c0 Jul 14 00:27:35.483845 [ 23.782353] [<ffffffff8117ed88>] generic_file_read_iter+0x278/0x5d0 Jul 14 00:27:35.491842 [ 23.782357] [<ffffffff811d4ad9>] new_sync_read+0x79/0xb0 Jul 14 00:27:35.491880 [ 23.782360] [<ffffffff811d5303>] vfs_read+0x93/0x180 Jul 14 00:27:35.499846 [ 23.782364] [<ffffffff811d5e61>] SyS_read+0x41/0xb0 Jul 14 00:27:35.507832 [ 23.782367] [<ffffffff817e6589>] system_call_fastpath+0x12/0x17 Jul 14 00:27:35.507869 [ 23.782370] Mem-Info: Jul 14 00:27:35.515838 [ 23.782371] DMA per-cpu: Jul 14 00:27:35.515869 [ 23.782373] CPU 0: hi: 0, btch: 1 usd: 0 Jul 14 00:27:35.523838 [ 23.782375] CPU 1: hi: 0, btch: 1 usd: 0 Jul 14 00:27:35.523873 [ 23.782377] CPU 2: hi: 0, btch: 1 usd: 0 Jul 14 00:27:35.531843 [ 23.782379] CPU 3: hi: 0, btch: 1 usd: 0 Jul 14 00:27:35.531878 [ 23.782381] DMA32 per-cpu: Jul 14 00:27:35.539836 [ 23.782382] CPU 0: hi: 186, btch: 31 usd: 0 Jul 14 00:27:35.539873 [ 23.782384] CPU 1: hi: 186, btch: 31 usd: 0 Jul 14 00:27:35.547841 [ 23.782386] CPU 2: hi: 186, btch: 31 usd: 0 Jul 14 00:27:35.555833 [ 23.782388] CPU 3: hi: 186, btch: 31 usd: 0 Jul 14 00:27:35.555868 [ 23.782392] active_anon:0 inactive_anon:0 isolated_anon:0 Jul 14 00:27:35.563857 [ 23.782392] active_file:0 inactive_file:0 isolated_file:0 Jul 14 00:27:35.563895 [ 23.782392] unevictable:1 dirty:0 writeback:18 unstable:0 Jul 14 00:27:35.571859 [ 23.782392] free:1060 slab_reclaimable:1888 slab_unreclaimable:4074 Jul 14 00:27:35.579860 [ 23.782392] mapped:2923 shmem:153 pagetables:588 bounce:0 Jul 14 00:27:35.587840 [ 23.782392] free_cma:0 Jul 14 00:27:35.587872 [ 23.782403] DMA free:1668kB min:96kB low:120kB high:144kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:1044kB dirty:0kB writeback:4kB mapped:340kB shmem:8kB slab_reclaimable:92kB slab_unreclaimable:332kB kernel_stack:48kB pagetables:116kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jul 14 00:27:35.627846 [ 23.782413] lowmem_reserve[]: 0 396 396 396 Jul 14 00:27:35.627882 [ 23.782419] DMA32 free:2572kB min:2496kB low:3120kB high:3744kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:508308kB managed:408952kB mlocked:31912kB dirty:0kB writeback:68kB mapped:11352kB shmem:604kB slab_reclaimable:7460kB slab_unreclaimable:15964kB kernel_stack:2400kB pagetables:2236kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:36 all_unreclaimable? yes Jul 14 00:27:35.675846 [ 23.782431] lowmem_reserve[]: 0 0 0 0 Jul 14 00:27:35.675882 [ 23.782433] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1664kB Jul 14 00:27:35.683866 [ 23.782442] DMA32: 29*4kB (UR) 0*8kB 3*16kB (R) 0*32kB 1*64kB (R) 1*128kB (R) 1*256kB (R) 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB = 2660kB Jul 14 00:27:35.699851 [ 23.782452] 15076 total pagecache pages Jul 14 00:27:35.699885 [ 23.782455] 0 pages in swap cache Jul 14 00:27:35.707849 [ 23.782465] Swap cache stats: add 0, delete 0, find 0/0 Jul 14 00:27:35.715836 [ 23.782468] Free swap = 1949692kB Jul 14 00:27:35.715869 [ 23.782472] Total swap = 1949692kB Jul 14 00:27:35.715899 [ 23.782476] 131071 pages RAM Jul 14 00:27:35.723847 [ 23.782480] 0 pages HighMem/MovableOnly Jul 14 00:27:35.723882 [ 23.782484] 24860 pages reserved Jul 14 00:27:35.731838 [ 23.782487] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Jul 14 00:27:35.739850 [ 23.782495] [ 1316] 0 1316 9825 819 21 0 -1000 udevd Jul 14 00:27:35.747851 [ 23.782501] [ 2038] 0 2038 1084 25 7 0 0 ntpdate Jul 14 00:27:35.755844 [ 23.782506] [ 2040] 0 2040 1569 172 9 0 0 lockfile-create Jul 14 00:27:35.763856 [ 23.782509] [ 2047] 0 2047 1060 331 7 0 0 startpar Jul 14 00:27:35.771890 [ 23.782512] [ 2061] 0 2061 9270 668 24 0 0 rpcbind Jul 14 00:27:35.779829 [ 23.782516] [ 2076] 106 2076 9320 700 22 0 0 rpc.statd Jul 14 00:27:35.787856 [ 23.782519] [ 2088] 0 2088 6892 58 18 0 0 rpc.idmapd Jul 14 00:27:35.795799 [ 23.782522] [ 2160] 0 2160 65721 785 33 0 0 rsyslogd Jul 14 00:27:35.803857 [ 23.782525] [ 2175] 0 2175 1064 394 8 0 0 acpid Jul 14 00:27:35.811854 [ 23.782528] [ 2211] 0 2211 19431 1167 41 0 0 apache2 Jul 14 00:27:35.820250 [ 23.782532] [ 2216] 33 2216 91722 1031 69 0 0 apache2 Jul 14 00:27:35.827884 [ 23.782534] [ 2422] 0 2422 4756 42 13 0 0 atd Jul 14 00:27:35.835897 [ 23.782537] [ 2423] 0 2423 6869 590 18 0 0 cron Jul 14 00:27:35.843893 [ 23.782541] [ 2606] 0 2606 8780 883 21 0 0 oxenstored Jul 14 00:27:35.851883 [ 23.782544] [ 2619] 105 2619 10560 551 24 0 0 dbus-daemon Jul 14 00:27:35.859858 [ 23.782547] [ 2624] 0 2624 13796 700 28 0 -1000 sshd Jul 14 00:27:35.867866 [ 23.782550] [ 2631] 0 2631 22682 57 19 0 0 xenconsoled Jul 14 00:27:35.875902 [ 23.782554] [ 2643] 104 2643 13313 828 26 0 0 exim4 Jul 14 00:27:35.883887 [ 23.782557] [ 2660] 107 2660 8346 1093 21 0 0 ntpd Jul 14 00:27:35.891912 [ 23.782560] [ 2681] 0 2681 2650 399 10 0 0 getty Jul 14 00:27:35.899869 [ 23.782563] [ 2682] 0 2682 3166 417 12 0 0 getty Jul 14 00:27:35.907853 [ 23.782566] [ 2683] 0 2683 1063 173 7 0 0 getty Jul 14 00:27:35.915859 [ 23.782569] [ 2684] 0 2684 3166 416 12 0 0 getty Jul 14 00:27:35.923807 [ 23.782572] [ 2685] 0 2685 3560 425 12 0 0 getty Jul 14 00:27:35.931919 [ 23.782575] [ 2686] 0 2686 3166 425 11 0 0 getty Jul 14 00:27:35.939914 [ 23.782578] [ 2687] 0 2687 1591 404 8 0 0 getty Jul 14 00:27:35.947888 [ 23.782581] [ 2688] 0 2688 3166 437 12 0 0 getty Jul 14 00:27:35.956047 [ 23.782587] [ 2690] 0 2690 3166 35 9 0 0 getty Jul 14 00:27:35.964079 [ 23.782592] [ 2691] 0 2691 3166 34 9 0 0 getty Jul 14 00:27:35.972103 [ 23.782597] [ 2693] 0 2693 9825 195 19 0 -1000 udevd Jul 14 00:27:35.980089 [ 23.782601] [ 2694] 0 2694 9825 195 18 0 -1000 udevd Jul 14 00:27:35.987933 [ 23.782604] Out of memory: Kill process 2211 (apache2) score 1 or sacrifice child Jul 14 00:27:35.996043 [ 23.782607] Killed process 2216 (apache2) total-vm:366888kB, anon-rss:1556kB, file-rss:2568kB Jul 14 00:27:36.004061 [ 23.790140] getty invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 Jul 14 00:27:36.011954 [ 23.790145] getty cpuset=/ mems_allowed=0 Jul 14 00:27:36.019911 [ 23.790150] CPU: 0 PID: 2681 Comm: getty Not tainted 3.18.37 #1 Jul 14 00:27:36.019951 [ 23.790152] Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS S1200BT.86B.02.00.0042.050820141549 05/08/2014 Jul 14 00:27:36.035934 [ 23.790156] 0000000000000000 ffff88001220ba78 ffffffff817dcab5 00000000000200da Jul 14 00:27:36.043951 [ 23.790161] 0000000000000000 ffff88001220bad8 ffffffff8118082f ffff88001220ba98 Jul 14 00:27:36.051973 [ 23.790166] ffffffff8114c315 ffffffff817e5ee9 0000000000000001 ffff88001220bad8 Jul 14 00:27:36.059965 [ 23.790170] Call Trace: Jul 14 00:27:36.059997 [ 23.790174] [<ffffffff817dcab5>] dump_stack+0x7c/0x98 Jul 14 00:27:36.067938 [ 23.790178] [<ffffffff8118082f>] dump_header.isra.11+0x8f/0x1e0 Jul 14 00:27:36.067977 [ 23.790181] [<ffffffff8114c315>] ? __delayacct_freepages_end+0x45/0x50 Jul 14 00:27:36.076024 [ 23.790185] [<ffffffff817e5ee9>] ? _raw_spin_unlock_irqrestore+0x29/0x90 Jul 14 00:27:36.083947 [ 23.790188] [<ffffffff8131f93e>] ? ___ratelimit+0xae/0x160 Jul 14 00:27:36.092010 [ 23.790191] [<ffffffff81180ddc>] oom_kill_process+0x20c/0x370 Jul 14 00:27:36.099947 [ 23.790195] [<ffffffff811812d1>] out_of_memory+0x211/0x330 Jul 14 00:27:36.099991 [ 23.790198] [<ffffffff81186331>] __alloc_pages_nodemask+0xb11/0xb50 Jul 14 00:27:36.107888 [ 23.790202] [<ffffffff811a9757>] do_cow_fault.isra.96+0x47/0x2a0 Jul 14 00:27:36.116087 [ 23.790207] [<ffffffff81054bf9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e Jul 14 00:27:36.123936 [ 23.790210] [<ffffffff811ab675>] handle_mm_fault+0x6b5/0xd50 Jul 14 00:27:36.124004 [ 23.790214] [<ffffffff810a68dd>] __do_page_fault+0x1bd/0x590 Jul 14 00:27:36.132061 [ 23.790217] [<ffffffff81198939>] ? vm_mmap_pgoff+0x99/0xc0 Jul 14 00:27:36.140090 [ 23.790220] [<ffffffff810a6cdc>] do_page_fault+0x2c/0x40 Jul 14 00:27:36.148022 [ 23.790224] [<ffffffff817e81b8>] page_fault+0x28/0x30 Jul 14 00:27:36.148057 [ 23.790226] Mem-Info: Jul 14 00:27:36.148083 [ 23.790227] DMA per-cpu: Jul 14 00:27:36.155974 [ 23.790229] CPU 0: hi: 0, btch: 1 usd: 0 Jul 14 00:27:36.155995 [ 23.790231] CPU 1: hi: 0, btch: 1 usd: 0 Jul 14 00:27:36.164132 [ 23.790233] CPU 2: hi: 0, btch: 1 usd: 0 Jul 14 00:27:36.172145 [ 23.790235] CPU 3: hi: 0, btch: 1 usd: 0 Jul 14 00:27:36.172208 [ 23.790237] DMA32 per-cpu: Jul 14 00:27:36.180000 [ 23.790238] CPU 0: hi: 186, btch: 31 usd: 0 Jul 14 00:27:36.180049 [ 23.790241] CPU 1: hi: 186, btch: 31 usd: 128 Jul 14 00:27:36.187954 [ 23.790243] CPU 2: hi: 186, btch: 31 usd: 0 Jul 14 00:27:36.187996 [ 23.790245] CPU 3: hi: 186, btch: 31 usd: 0 Jul 14 00:27:36.195965 [ 23.790248] active_anon:0 inactive_anon:0 isolated_anon:0 Jul 14 00:27:36.203964 [ 23.790248] active_file:0 inactive_file:0 isolated_file:0 Jul 14 00:27:36.204010 [ 23.790248] unevictable:1 dirty:4 writeback:0 unstable:0 Jul 14 00:27:36.211969 [ 23.790248] free:1012 slab_reclaimable:1888 slab_unreclaimable:4067 Jul 14 00:27:36.219948 [ 23.790248] mapped:2877 shmem:154 pagetables:521 bounce:0 Jul 14 00:27:36.227943 [ 23.790248] free_cma:0 Jul 14 00:27:36.227982 [ 23.790259] DMA free:1680kB min:96kB low:120kB high:144kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:1044kB dirty:0kB writeback:0kB mapped:340kB shmem:8kB slab_reclaimable:92kB slab_unreclaimable:332kB kernel_stack:48kB pagetables:84kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jul 14 00:27:36.267995 [ 23.790269] lowmem_reserve[]: 0 396 396 396 Jul 14 00:27:36.268052 [ 23.790275] DMA32 free:2368kB min:2496kB low:3120kB high:3744kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:508308kB managed:408952kB mlocked:319 Jul 14 00:28:18.168603 <client 0x1c9b400 connected - now 1 clients> From: osstest service owner <osstest-admin@xxxxxxxxxxxxxx> To: <xen-devel@xxxxxxxxxxxxxxxxxxx>, <osstest-admin@xxxxxxxxxxxxxx> Subject: [linux-3.18 bisection] complete test-amd64-amd64-xl-credit2 Date: Sat, 16 Jul 2016 14:52:57 +0000 branch xen-unstable xenbranch xen-unstable job test-amd64-amd64-xl-credit2 testid xen-boot Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: a2d8c514753276394d68414f563591f174ef86cb Bug not present: 8f620446135b64ca6f96cf32066a76d64e79a388 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97435/ commit a2d8c514753276394d68414f563591f174ef86cb Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Date: Fri Jun 24 14:50:01 2016 -0700 mm/swap.c: flush lru pvecs on compound page arrival [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ] Currently we can have compound pages held on per cpu pagevecs, which leads to a lot of memory unavailable for reclaim when needed. In the systems with hundreads of processors it can be GBs of memory. On of the way of reproducing the problem is to not call munmap explicitly on all mapped regions (i.e. after receiving SIGTERM). After that some pages (with THP enabled also huge pages) may end up on lru_add_pvec, example below. void main() { #pragma omp parallel { size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS , -1, 0); if (p != MAP_FAILED) memset(p, 0, size); //munmap(p, size); // uncomment to make the problem go away } } When we run it with THP enabled it will leave significant amount of memory on lru_add_pvec. This memory will be not reclaimed if we hit OOM, so when we run above program in a loop: for i in `seq 100`; do ./a.out; done many processes (95% in my case) will be killed by OOM. The primary point of the LRU add cache is to save the zone lru_lock contention with a hope that more pages will belong to the same zone and so their addition can be batched. The huge page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like a safer option when compared to a potential excess in the caching which can be quite large and much harder to fix because lru_add_drain_all is way to expensive and it is not really clear what would be a good moment to call it. Similarly we can reproduce the problem on lru_deactivate_pvec by adding: madvise(p, size, MADV_FREE); after memset. This patch flushes lru pvecs on compound page arrival making the problem less severe - after applying it kill rate of above example drops to 0%, due to reducing maximum amount of memory held on pvec from 28MB (with THP) to 56kB per CPU. Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Cc: Ming Li <mingli199x@xxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx> For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-amd64-xl-credit2.xen-boot.html Revision IDs in each graph node refer, respectively, to the Trees above. ---------------------------------------- Running cs-bisection-step --graph-out=/home/logs/results/bisect/linux-3.18/test-amd64-amd64-xl-credit2.xen-boot --summary-out=tmp/97435.bisection-summary --basis-template=96188 --blessings=real,real-bisect linux-3.18 test-amd64-amd64-xl-credit2 xen-boot Searching for failure / basis pass: 97377 fail [host=chardonnay0] / 96188 [host=italia0] 96161 [host=baroque1] 95844 [host=godello0] 95809 [host=godello1] 95597 [host=fiano0] 95521 [host=pinot1] 95458 [host=elbling1] 95406 [host=fiano1] 94728 [host=merlot1] 94153 [host=elbling0] 94083 ok. Failure / basis pass flights: 97377 / 94083 (tree with no url: minios) (tree with no url: ovmf) (tree with no url: seabios) Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://xenbits.xen.org/qemu-xen.git Tree: xen git://xenbits.xen.org/xen.git Latest 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47 Basis pass 6b12ebc0ecce75d7bd3660cd85f8b47a615c2071 c530a75c1e6a472b0eb9558310b518f0dfcd8860 e4ceb77cf88bc44f0b7fe39225c49d660735f327 62b3d206425c245ed0a020390a64640d40d97471 c79fc6c4bee28b40948838a760b4aaadf6b5cd47 Generating revisions with ./adhoc-revtuple-generator git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#6b12ebc0ecce75d7bd3660cd85f8b47a615c2071-0ac0a856d986c1ab240753479f5e50fdfab82b14 git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/qemu-xen-traditional.git#e4ceb77cf88bc44f0b7fe39225c49d660735f327-6e20809727261599e8527c456eb078c0e89139a1 git://xenbits.xen.org/qemu-xen.git#62b3d206425c245ed0a020390a64640d40d97471-44a072f0de0d57c95c2212bbce02888832b7b74f git://xenbits.xen.org/xen.git#c79fc6c4bee28b40948838a760b4aaadf6b5cd47-ea210c52abb6458e39f5365f7f2c3abb9c191c47 Loaded 12048 nodes in revision graph Searching for test results: 94035 [host=huxelrebe0] 94083 pass 6b12ebc0ecce75d7bd3660cd85f8b47a615c2071 c530a75c1e6a472b0eb9558310b518f0dfcd8860 e4ceb77cf88bc44f0b7fe39225c49d660735f327 62b3d206425c245ed0a020390a64640d40d97471 c79fc6c4bee28b40948838a760b4aaadf6b5cd47 94056 [host=huxelrebe1] 94153 [host=elbling0] 94728 [host=merlot1] 95406 [host=fiano1] 95458 [host=elbling1] 95521 [host=pinot1] 95597 [host=fiano0] 95809 [host=godello1] 95844 [host=godello0] 96161 [host=baroque1] 96188 [host=italia0] 97278 fail irrelevant 97289 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47 97321 pass 6b12ebc0ecce75d7bd3660cd85f8b47a615c2071 c530a75c1e6a472b0eb9558310b518f0dfcd8860 e4ceb77cf88bc44f0b7fe39225c49d660735f327 62b3d206425c245ed0a020390a64640d40d97471 c79fc6c4bee28b40948838a760b4aaadf6b5cd47 97346 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47 97357 pass f27ca140ad82b5e76282cc5b54bfb0a665520d17 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97352 pass b5076139991c6b12c62346d9880eec1d4227d99f c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 87beb45e0b05be76755cac53322aae4f5b426aac 97319 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47 97389 fail e23042d05035bd64c980ea8f1d9d311972b09104 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97404 pass faa35ed7c7dd74a62bb58340e0ba1819ec33e4e1 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97423 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97393 pass 1b9dc6680de288cb47e0a3c1587ba69879b3c26f c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97397 pass 4c2b0216cdf54e81f7c0e841b5bb1116701ae25b c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97364 fail 30888a2ea001e237ae9960de877d6f4d2351d8a2 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97376 blocked 71c879eb92223676c4583e130f1b0ce26cddb891 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97413 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97379 fail 6d94f01566e30c87ebd42e1175ade4f648735578 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97399 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97417 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97421 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97377 fail 0ac0a856d986c1ab240753479f5e50fdfab82b14 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f ea210c52abb6458e39f5365f7f2c3abb9c191c47 97428 pass 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 97435 fail a2d8c514753276394d68414f563591f174ef86cb c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 Searching for interesting versions Result found: flight 94083 (pass), for basis pass Result found: flight 97289 (fail), for basis failure Repro found: flight 97321 (pass), for basis pass Repro found: flight 97346 (fail), for basis failure 0 revisions at 8f620446135b64ca6f96cf32066a76d64e79a388 c530a75c1e6a472b0eb9558310b518f0dfcd8860 6e20809727261599e8527c456eb078c0e89139a1 44a072f0de0d57c95c2212bbce02888832b7b74f 22ea8ad02e465e32cd40887c750b55c3a997a288 No revisions left to test, checking graph state. Result found: flight 97413 (pass), for last pass Result found: flight 97417 (fail), for first failure Repro found: flight 97421 (pass), for last pass Repro found: flight 97423 (fail), for first failure Repro found: flight 97428 (pass), for last pass Repro found: flight 97435 (fail), for first failure *** Found and reproduced problem changeset *** Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Bug introduced: a2d8c514753276394d68414f563591f174ef86cb Bug not present: 8f620446135b64ca6f96cf32066a76d64e79a388 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/97435/ commit a2d8c514753276394d68414f563591f174ef86cb Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Date: Fri Jun 24 14:50:01 2016 -0700 mm/swap.c: flush lru pvecs on compound page arrival [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ] Currently we can have compound pages held on per cpu pagevecs, which leads to a lot of memory unavailable for reclaim when needed. In the systems with hundreads of processors it can be GBs of memory. On of the way of reproducing the problem is to not call munmap explicitly on all mapped regions (i.e. after receiving SIGTERM). After that some pages (with THP enabled also huge pages) may end up on lru_add_pvec, example below. void main() { #pragma omp parallel { size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS , -1, 0); if (p != MAP_FAILED) memset(p, 0, size); //munmap(p, size); // uncomment to make the problem go away } } When we run it with THP enabled it will leave significant amount of memory on lru_add_pvec. This memory will be not reclaimed if we hit OOM, so when we run above program in a loop: for i in `seq 100`; do ./a.out; done many processes (95% in my case) will be killed by OOM. The primary point of the LRU add cache is to save the zone lru_lock contention with a hope that more pages will belong to the same zone and so their addition can be batched. The huge page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like a safer option when compared to a potential excess in the caching which can be quite large and much harder to fix because lru_add_drain_all is way to expensive and it is not really clear what would be a good moment to call it. Similarly we can reproduce the problem on lru_deactivate_pvec by adding: madvise(p, size, MADV_FREE); after memset. This patch flushes lru pvecs on compound page arrival making the problem less severe - after applying it kill rate of above example drops to 0%, due to reducing maximum amount of memory held on pvec from 28MB (with THP) to 56kB per CPU. Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Cc: Ming Li <mingli199x@xxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx> dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.719964 to fit pnmtopng: 47 colors found Revision graph left in /home/logs/results/bisect/linux-3.18/test-amd64-amd64-xl-credit2.xen-boot.{dot,ps,png,html,svg}. ---------------------------------------- 97435: tolerable ALL FAIL flight 97435 linux-3.18 real-bisect [real] http://logs.test-lab.xenproject.org/osstest/logs/97435/ Failures :-/ but no regressions. Tests which did not succeed, including tests which could not be run: test-amd64-amd64-xl-credit2 6 xen-boot fail baseline untested jobs: test-amd64-amd64-xl-credit2 fail ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html