> On Aug 1, 2019, at 2:51 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote: > > On Wed, Jul 31, 2019 at 02:18:00PM -0400, Qian Cai wrote: >> On Wed, 2019-07-31 at 12:09 -0400, Qian Cai wrote: >>> On Wed, 2019-07-31 at 14:34 +0900, Minchan Kim wrote: >>>> On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote: >>>>> OOM workloads with swapping is unable to recover with linux-next since >>>>> next- >>>>> 20190729 due to the commit "mm: account nr_isolated_xxx in >>>>> [isolate|putback]_lru_page" breaks OOM with swap" [1] >>>>> >>>>> [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kerne >>>>> l. >>>>> org/ >>>>> T/#mdcd03bcb4746f2f23e6f508c205943726aee8355 >>>>> >>>>> For example, LTP oom01 test case is stuck for hours, while it finishes in >>>>> a >>>>> few >>>>> minutes here after reverted the above commit. Sometimes, it prints those >>>>> message >>>>> while hanging. >>>>> >>>>> [ 509.983393][ T711] INFO: task oom01:5331 blocked for more than 122 >>>>> seconds. >>>>> [ 509.983431][ T711] Not tainted 5.3.0-rc2-next-20190730 #7 >>>>> [ 509.983447][ T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [ 509.983477][ T711] oom01 D24656 5331 5157 0x00040000 >>>>> [ 509.983513][ T711] Call Trace: >>>>> [ 509.983538][ T711] [c00020037d00f880] [0000000000000008] 0x8 >>>>> (unreliable) >>>>> [ 509.983583][ T711] [c00020037d00fa60] [c000000000023724] >>>>> __switch_to+0x3a4/0x520 >>>>> [ 509.983615][ T711] [c00020037d00fad0] [c0000000008d17bc] >>>>> __schedule+0x2fc/0x950 >>>>> [ 509.983647][ T711] [c00020037d00fba0] [c0000000008d1e68] >>>>> schedule+0x58/0x150 >>>>> [ 509.983684][ T711] [c00020037d00fbd0] [c0000000008d7614] >>>>> rwsem_down_read_slowpath+0x4b4/0x630 >>>>> [ 509.983727][ T711] [c00020037d00fc90] [c0000000008d7dfc] >>>>> down_read+0x12c/0x240 >>>>> [ 509.983758][ T711] [c00020037d00fd20] [c00000000005fb28] >>>>> __do_page_fault+0x6f8/0xee0 >>>>> [ 509.983801][ T711] [c00020037d00fe20] [c00000000000a364] >>>>> handle_page_fault+0x18/0x38 >>>> >>>> Thanks for the testing! No surprise the patch make some bugs because >>>> it's rather tricky. >>>> >>>> Could you test this patch? >>> >>> It does help the situation a bit, but the recover speed is still way slower >>> than >>> just reverting the commit "mm: account nr_isolated_xxx in >>> [isolate|putback]_lru_page". For example, on this powerpc system, it used to >>> take 4-min to finish oom01 while now still take 13-min. >>> >>> The oom02 (testing NUMA mempolicy) takes even longer and I gave up after 26- >>> min >>> with several hang tasks below. >> >> Also, oom02 is stuck on an x86 machine. > > Yeb, above my patch had a bug to test page type after page was freed. > However, after the review, I found other bugs but I don't think it's > related to your problem, either. Okay, then, let's revert the patch. > > Andrew, could you revert the below patch? > "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" > > It's just clean up patch and isn't related to new madvise hint system call now. > Thus, it shouldn't be blocker. > > Anyway, I want to fix the problem when I have available time. > Qian, What's the your config and system configuration on x86? > Is it possible to reproduce in qemu? > It would be really helpful if you tell me reproduce step on x86. https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config The config could work in Openstack, and I never tried in QEMU. It might need a few modification here or there. The reproduced x86 server is, HPE ProLiant DL385 Gen10 AMD EPYC 7251 8-Core Processor Smart Storage PQI 12G SAS/PCIe 3 Memory: 32768 MB NUMA Nodes: 8