Michal Hocko wrote: > On Fri 14-07-17 21:30:54, Tetsuo Handa wrote: > > Michal Hocko wrote: > [...] > > > As I've said earlier, if there is no other way to make printk work without all > > > these nasty side effected then I would be OK to add a printk context > > > specific calls into the oom killer. > > > > > > Removing the rest because this is again getting largely tangent. The > > > primary problem you are seeing is that we stumble over printk here. > > > Unless I can see a sound argument this is not the case it doesn't make > > > any sense to discuss allocator changes. > > > > You are still ignoring my point. I agree that we stumble over printk(), but > > printk() is nothing but one of locations we stumble. > > I am not ignoring it. You just mix too many things together to have a > meaningful conversation... > > > Look at schedule_timeout_killable(1) in out_of_memory() which is called with > > oom_lock still held. I'm reporting that even printk() is offloaded to printk > > kernel thread, scheduling priority can make schedule_timeout_killable(1) sleep > > for more than 12 minutes (which is intended to sleep for only one millisecond). > > (I gave up waiting and pressed SysRq-i. I can't imagine how long it would have > > continued sleeping inside schedule_timeout_killable(1) with oom_lock held.) > > > > Without cooperation from other allocating threads which failed to hold oom_lock, > > it is dangerous to keep out_of_memory() preemptible/schedulable context. > > I have already tried to explain that this is something that the whole > reclaim path suffers from the priority inversions problem because it has > never been designed to handle that. You are just poking to one > particular path of the reclaim stack and missing the whole forest for a > tree. How the hack is this any different from a reclaim path stumbling > over a lock down inside the filesystem and stalling basically everybody > from making a reasonable progress? Is this a problem? Of course it is, > theoretically. In practice not all that much to go and reimplement the > whole stack. At least I haven't seen any real life reports complaining > about this. I'm failing to test your "mm, oom: fix oom_reaper fallouts" patches using http://lkml.kernel.org/r/201708072228.FAJ09347.tOOVOFFQJSHMFL@xxxxxxxxxxxxxxxxxxx because it fails to invoke the OOM killer for unknown reason. I analyzed it using kmallocwd and confirmed that two dozens of concurrent allocating threads is sufficient for hitting this warn_alloc() v.s. printk() lockup. Since printk offloading is not yet available, serialization is the the only choice we can mitigate this problem for now. How long will we have to waste more? ---------- [ 645.993827] MemAlloc-Info: stalling=18 dying=0 exiting=0 victim=0 oom_count=29 (...snipped...) [ 645.996694] MemAlloc: vmtoolsd(2221) flags=0x400100 switches=5607 seq=3740 gfp=0x14200ca(GFP_HIGHUSER_MOVABLE) order=0 delay=7541 [ 645.996695] vmtoolsd R running task 11960 2221 1 0x00000080 [ 645.996699] Call Trace: [ 645.996708] ? console_unlock+0x373/0x4a0 [ 645.996709] ? vprintk_emit+0x211/0x2f0 [ 645.996714] ? vprintk_emit+0x21a/0x2f0 [ 645.996720] ? vprintk_default+0x1a/0x20 [ 645.996722] ? vprintk_func+0x22/0x60 [ 645.996724] ? printk+0x53/0x6a [ 645.996731] ? dump_stack_print_info+0xab/0xb0 [ 645.996736] ? dump_stack+0x5e/0x9e [ 645.996739] ? dump_header+0x9d/0x3fa [ 645.996744] ? trace_hardirqs_on+0xd/0x10 [ 645.996751] ? oom_kill_process+0x226/0x650 [ 645.996757] ? out_of_memory+0x13d/0x570 [ 645.996758] ? out_of_memory+0x20d/0x570 [ 645.996763] ? __alloc_pages_nodemask+0xbc8/0xed0 [ 645.996780] ? alloc_pages_current+0x65/0xb0 [ 645.996784] ? __page_cache_alloc+0x10b/0x140 [ 645.996789] ? filemap_fault+0x3df/0x6a0 [ 645.996790] ? filemap_fault+0x2ab/0x6a0 [ 645.996797] ? xfs_filemap_fault+0x34/0x50 [ 645.996799] ? __do_fault+0x19/0x120 [ 645.996803] ? __handle_mm_fault+0xa99/0x1260 [ 645.996814] ? handle_mm_fault+0x1b2/0x350 [ 645.996816] ? handle_mm_fault+0x46/0x350 [ 645.996820] ? __do_page_fault+0x1da/0x510 [ 645.996828] ? do_page_fault+0x21/0x70 [ 645.996832] ? page_fault+0x22/0x30 (...snipped...) [ 645.998748] MemAlloc-Info: stalling=18 dying=0 exiting=0 victim=0 oom_count=29 (...snipped...) [ 1472.484590] MemAlloc-Info: stalling=25 dying=0 exiting=0 victim=0 oom_count=29 (...snipped...) [ 1472.487341] MemAlloc: vmtoolsd(2221) flags=0x400100 switches=5607 seq=3740 gfp=0x14200ca(GFP_HIGHUSER_MOVABLE) order=0 delay=834032 [ 1472.487342] vmtoolsd R running task 11960 2221 1 0x00000080 [ 1472.487346] Call Trace: [ 1472.487353] ? console_unlock+0x373/0x4a0 [ 1472.487355] ? vprintk_emit+0x211/0x2f0 [ 1472.487360] ? vprintk_emit+0x21a/0x2f0 [ 1472.487367] ? vprintk_default+0x1a/0x20 [ 1472.487369] ? vprintk_func+0x22/0x60 [ 1472.487370] ? printk+0x53/0x6a [ 1472.487377] ? dump_stack_print_info+0xab/0xb0 [ 1472.487381] ? dump_stack+0x5e/0x9e [ 1472.487384] ? dump_header+0x9d/0x3fa [ 1472.487389] ? trace_hardirqs_on+0xd/0x10 [ 1472.487396] ? oom_kill_process+0x226/0x650 [ 1472.487402] ? out_of_memory+0x13d/0x570 [ 1472.487403] ? out_of_memory+0x20d/0x570 [ 1472.487408] ? __alloc_pages_nodemask+0xbc8/0xed0 [ 1472.487426] ? alloc_pages_current+0x65/0xb0 [ 1472.487429] ? __page_cache_alloc+0x10b/0x140 [ 1472.487434] ? filemap_fault+0x3df/0x6a0 [ 1472.487435] ? filemap_fault+0x2ab/0x6a0 [ 1472.487441] ? xfs_filemap_fault+0x34/0x50 [ 1472.487444] ? __do_fault+0x19/0x120 [ 1472.487447] ? __handle_mm_fault+0xa99/0x1260 [ 1472.487459] ? handle_mm_fault+0x1b2/0x350 [ 1472.487460] ? handle_mm_fault+0x46/0x350 [ 1472.487465] ? __do_page_fault+0x1da/0x510 [ 1472.487472] ? do_page_fault+0x21/0x70 [ 1472.487476] ? page_fault+0x22/0x30 (...snipped...) [ 1472.489975] MemAlloc-Info: stalling=25 dying=0 exiting=0 victim=0 oom_count=29 ---------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>