Michal Hocko wrote: > On Tue 13-12-16 21:06:57, Tetsuo Handa wrote: > > http://I-love.SAKURA.ne.jp/tmp/serial-20161213.txt.xz is a console log with > > this patch applied. Due to hung task warnings disabled, amount of messages > > are significantly reduced. > > > > Uptime > 400 are testcases where the stresser was invoked via "taskset -c 0". > > Since there are some "** XXX printk messages dropped **" messages, I can't > > tell whether the OOM killer was able to make forward progress. But guessing > > from the result that there is no corresponding "Killed process" line for > > "Out of memory: " line at uptime = 450 and the duration of PID 14622 stalled, > > I think it is OK to say that the system got stuck because the OOM killer was > > not able to make forward progress. > > The oom situation certainly didn't get resolved. I would be really > curious whether we can rule out the printk out of the picture, though. I > am still not sure we can rule out some obscure OOM killer bug at this > stage. > > What if we lower the loglevel as much as possible to only see KERN_ERR > should be sufficient to see few oom killer messages while suppressing > most of the other noise. Unfortunatelly, even messages with level > > loglevel get stored into the ringbuffer (as I've just learned) so > console_unlock() has to crawl through them just to drop them (Meh) but > at least it doesn't have to go to the serial console drivers and spend > even more time there. An alternative would be to tweak printk to not > even store those messaes. Something like the below Changing loglevel is not a option for me. Under OOM, syslog cannot work. Only messages sent to serial console / netconsole are available for understanding something went wrong. And serial consoles may be very slow. We need to try to avoid uncontrolled printk(). > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index f7a55e9ff2f7..197f2b9fb703 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1865,6 +1865,15 @@ asmlinkage int vprintk_emit(int facility, int level, > lflags |= LOG_CONT; > } > > + if (suppress_message_printing(kern_level)) { > + logbuf_cpu = UINT_MAX; > + raw_spin_unlock(&logbuf_lock); > + lockdep_on(); > + local_irq_restore(flags); > + return 0; > + } > + > + > text_len -= 2; > text += 2; > } > > So it would be really great if you could > 1) test with the fixed throttling > 2) loglevel=4 on the kernel command line > 3) try the above with the same loglevel > > ideally 1) would be sufficient and that would make the most sense from > the warn_alloc point of view. If this is 2 or 3 then we are hitting a > more generic problem and I would be quite careful to hack it around. Thus, I don't think I can do these. > > > ---------- > > [ 450.767693] Out of memory: Kill process 14642 (a.out) score 999 or sacrifice child > > [ 450.769974] Killed process 14642 (a.out) total-vm:4168kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > > [ 450.776538] oom_reaper: reaped process 14642 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > [ 450.781170] Out of memory: Kill process 14643 (a.out) score 999 or sacrifice child > > [ 450.783469] Killed process 14643 (a.out) total-vm:4168kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > > [ 450.787912] oom_reaper: reaped process 14643 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > [ 450.792630] Out of memory: Kill process 14644 (a.out) score 999 or sacrifice child > > [ 450.964031] a.out: page allocation stalls for 10014ms, order:0, mode:0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO) > > [ 450.964033] CPU: 0 PID: 14622 Comm: a.out Tainted: G W 4.9.0+ #99 > > (...snipped...) > > [ 740.984902] a.out: page allocation stalls for 300003ms, order:0, mode:0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO) > > [ 740.984905] CPU: 0 PID: 14622 Comm: a.out Tainted: G W 4.9.0+ #99 > > ---------- > > > > Although it is fine to make warn_alloc() less verbose, this is not > > a problem which can be avoided by simply reducing printk(). Unless > > we give enough CPU time to the OOM killer and OOM victims, it is > > trivial to lockup the system. > > This is simply hard if there are way too many tasks runnable... Runnable threads which do not involve page allocation do not harm. Only runnable threads which are almost-busy-looping with direct reclaim are problematic. mutex_lock_killable(&oom_lock) is the simplest approach for eliminating such threads and prevent such threads from calling warn_alloc() uncontrolledly. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>