Tetsuo Handa wrote: > I think that the oom_lock stall problem is essentially independent with > printk() from warn_alloc(). I can trigger lockups even if I use one-liner > stall report per each second like below. > > -------------------- > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6de9440..dc7f6be 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3657,10 +3657,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > > /* Make sure we know about allocations which stall for too long */ > if (time_after(jiffies, alloc_start + stall_timeout)) { > - warn_alloc(gfp_mask, > - "page allocation stalls for %ums, order:%u", > - jiffies_to_msecs(jiffies-alloc_start), order); > - stall_timeout += 10 * HZ; > + static DEFINE_RATELIMIT_STATE(stall_rs, HZ, 1); > + > + if (__ratelimit(&stall_rs)) { > + pr_warn("%s(%u): page allocation stalls for %ums, order:%u mode:%#x(%pGg)\n", > + current->comm, current->pid, jiffies_to_msecs(jiffies - alloc_start), > + order, gfp_mask, &gfp_mask); > + stall_timeout += 10 * HZ; > + } > } > > if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, > -------------------- > > This stall lasted with only two kernel messages per a second. I wonder we > have room for tuning warn_alloc() unless the trigger is identified and fixed. I retested using netconsole for recording clock time without delays. It seems to me that once the system reaches some threshold, even ratelimiting to two kernel messages per a second does not help flushing printk buffer. CPU time for flushing printk buffer is definitely insufficient because direct reclaimers waiting for oom_lock continued almost-busy looping. The first OOM killer was on 16:14:24 and a lot of OOM messages are scheduled for printk(). As of 16:17:00, the flushing was delaying for 40 seconds (clock time elapsed is 156 seconds but printk time elapsed is only 116 seconds). I pressed SysRq-H on 16:17:05 and the message by SysRq-H was printed on 16:20:03. The delay was getting larger (clock time since first OOM killer is 219 seconds but printk time elapsed is only 161 seconds). Then, I waited for a while whether ratelimiting to two kernel messages per a second helps flushing printk buffer. But it did not help because only two kernel messages are printed per a second or two seconds. I pressed SysRq-E on 16:23:15 and flushing became as fast as possible because some threads which terminated immediately helped solving the OOM situation and helped direct reclaimers not to consume CPU time by pointless direct reclaim loop. As of 16:23:18, all printk buffer was flushed and the delay was completely solved (clock time since first OOM killer is 534 seconds and printk elapsed time is 531 seconds). Complete log is http://I-love.SAKURA.ne.jp/tmp/netconsole-20161219.txt.xz . ---------------------------------------- 2016-12-19 16:14:24 192.168.186.128:6666 [ 61.383922] kworker/0:0 invoked oom-killer: gfp_mask=0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=0, order=0, oom_score_adj=0 (...snipped...) 2016-12-19 16:17:00 192.168.186.128:6666 [ 177.731850] __alloc_pages_slowpath: 86652 callbacks suppressed 2016-12-19 16:17:08 192.168.186.128:6666 [ 177.731852] a.out(4225): page allocation stalls for 47748ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) (...snipped...) 2016-12-19 16:20:03 192.168.186.128:6666 [ 222.241385] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z) (...snipped...) 2016-12-19 16:23:01 192.168.186.128:6666 [ 279.848855] __alloc_pages_slowpath: 90002 callbacks suppressed 2016-12-19 16:23:02 192.168.186.128:6666 [ 279.848857] a.out(4195): page allocation stalls for 150144ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:04 192.168.186.128:6666 [ 280.849909] __alloc_pages_slowpath: 90381 callbacks suppressed 2016-12-19 16:23:05 192.168.186.128:6666 [ 280.849913] a.out(4242): page allocation stalls for 151122ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:05 192.168.186.128:6666 [ 281.850976] __alloc_pages_slowpath: 90292 callbacks suppressed 2016-12-19 16:23:08 192.168.186.128:6666 [ 281.850979] a.out(4329): page allocation stalls for 151818ms, order:0 mode:0x2604240(GFP_NOFS|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:10 192.168.186.128:6666 [ 282.852029] __alloc_pages_slowpath: 89988 callbacks suppressed 2016-12-19 16:23:10 192.168.186.128:6666 [ 282.852032] a.out(3981): page allocation stalls for 152873ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:12 192.168.186.128:6666 [ 283.852585] __alloc_pages_slowpath: 90468 callbacks suppressed 2016-12-19 16:23:13 192.168.186.128:6666 [ 283.852589] a.out(3854): page allocation stalls for 154167ms, order:0 mode:0x342004a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE) 2016-12-19 16:23:15 192.168.186.128:6666 [ 284.853651] __alloc_pages_slowpath: 90091 callbacks suppressed 2016-12-19 16:23:15 192.168.186.128:6666 [ 284.853654] a.out(4011): page allocation stalls for 154625ms, order:0 mode:0x2604240(GFP_NOFS|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:15 192.168.186.128:6666 [ 285.854709] __alloc_pages_slowpath: 90481 callbacks suppressed 2016-12-19 16:23:15 192.168.186.128:6666 [ 285.854712] mysqld(2467): page allocation stalls for 155635ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) 2016-12-19 16:23:15 192.168.186.128:6666 [ 286.855765] __alloc_pages_slowpath: 90363 callbacks suppressed 2016-12-19 16:23:15 192.168.186.128:6666 [ 286.855768] mysqld(2467): page allocation stalls for 156636ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) 2016-12-19 16:23:15 192.168.186.128:6666 [ 287.856871] __alloc_pages_slowpath: 90794 callbacks suppressed (...snipped...) 2016-12-19 16:23:15 192.168.186.128:6666 [ 330.899814] __alloc_pages_slowpath: 85298 callbacks suppressed 2016-12-19 16:23:15 192.168.186.128:6666 [ 330.899817] a.out(3937): page allocation stalls for 201204ms, order:0 mode:0x342004a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE) 2016-12-19 16:23:15 192.168.186.128:6666 [ 331.900876] __alloc_pages_slowpath: 88338 callbacks suppressed 2016-12-19 16:23:15 192.168.186.128:6666 [ 331.900879] tuned(2414): page allocation stalls for 201869ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) 2016-12-19 16:23:15 192.168.186.128:6666 [ 332.901929] __alloc_pages_slowpath: 86052 callbacks suppressed 2016-12-19 16:23:15 192.168.186.128:6666 [ 332.901932] a.out(4252): page allocation stalls for 203197ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:16 192.168.186.128:6666 [ 333.902994] __alloc_pages_slowpath: 88699 callbacks suppressed 2016-12-19 16:23:16 192.168.186.128:6666 [ 333.902997] mysqld(2464): page allocation stalls for 203684ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) 2016-12-19 16:23:16 192.168.186.128:6666 [ 334.904057] __alloc_pages_slowpath: 88382 callbacks suppressed 2016-12-19 16:23:16 192.168.186.128:6666 [ 334.904059] systemd(1): page allocation stalls for 165774ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) (...snipped...) 2016-12-19 16:23:16 192.168.186.128:6666 [ 425.006012] __alloc_pages_slowpath: 88655 callbacks suppressed 2016-12-19 16:23:16 192.168.186.128:6666 [ 425.006014] systemd(1): page allocation stalls for 255876ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) 2016-12-19 16:23:16 192.168.186.128:6666 [ 426.007042] __alloc_pages_slowpath: 86528 callbacks suppressed 2016-12-19 16:23:16 192.168.186.128:6666 [ 426.007046] a.out(4062): page allocation stalls for 296298ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:16 192.168.186.128:6666 [ 427.007621] __alloc_pages_slowpath: 82527 callbacks suppressed 2016-12-19 16:23:17 192.168.186.128:6666 [ 427.007624] postgres(2416): page allocation stalls for 296946ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) 2016-12-19 16:23:17 192.168.186.128:6666 [ 428.008697] __alloc_pages_slowpath: 86985 callbacks suppressed 2016-12-19 16:23:17 192.168.186.128:6666 [ 428.008700] a.out(3841): page allocation stalls for 298293ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) (...snipped...) 2016-12-19 16:23:17 192.168.186.128:6666 [ 522.135752] __alloc_pages_slowpath: 90737 callbacks suppressed 2016-12-19 16:23:17 192.168.186.128:6666 [ 522.135755] a.out(4563): page allocation stalls for 392427ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:17 192.168.186.128:6666 [ 523.136816] __alloc_pages_slowpath: 90260 callbacks suppressed 2016-12-19 16:23:17 192.168.186.128:6666 [ 523.136819] a.out(4443): page allocation stalls for 393420ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:17 192.168.186.128:6666 [ 524.137890] __alloc_pages_slowpath: 86161 callbacks suppressed 2016-12-19 16:23:18 192.168.186.128:6666 [ 524.137892] a.out(4048): page allocation stalls for 394432ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:18 192.168.186.128:6666 [ 525.138932] __alloc_pages_slowpath: 90421 callbacks suppressed 2016-12-19 16:23:18 192.168.186.128:6666 [ 525.138934] a.out(4051): page allocation stalls for 395440ms, order:0 mode:0x342004a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE) (...snipped...) 2016-12-19 16:23:18 192.168.186.128:6666 [ 589.260793] __alloc_pages_slowpath: 90260 callbacks suppressed 2016-12-19 16:23:18 192.168.186.128:6666 [ 589.260797] a.out(3772): page allocation stalls for 459277ms, order:0 mode:0x26042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:18 192.168.186.128:6666 [ 590.261875] __alloc_pages_slowpath: 89716 callbacks suppressed 2016-12-19 16:23:18 192.168.186.128:6666 [ 590.261878] kworker/0:5(4629): page allocation stalls for 459918ms, order:0 mode:0x2604240(GFP_NOFS|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) 2016-12-19 16:23:18 192.168.186.128:6666 [ 591.262660] __alloc_pages_slowpath: 89555 callbacks suppressed 2016-12-19 16:23:18 192.168.186.128:6666 [ 591.262663] postgres(2415): page allocation stalls for 255434ms, order:0 mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) 2016-12-19 16:23:18 192.168.186.128:6666 [ 592.094337] sysrq: SysRq : Terminate All Tasks 2016-12-19 16:23:18 192.168.186.128:6666 [ 592.900136] systemd-journald[377]: Received SIGTERM. ---------------------------------------- So, I'd like to check whether async printk() can prevent the system from reaching the threshold. Though, I guess async printk() won't help for preemption outside printk() (i.e. CONFIG_PREEMPT=y and/or longer sleep by schedule_timeout_killable(1) after returning from oom_kill_process()). Sergey, will you share your async printk() patches? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>