Re: [PATCH] mm/page_alloc: Wait for oom_lock before retrying.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 13-12-16 21:06:57, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Mon 12-12-16 13:55:35, Michal Hocko wrote:
> > > On Mon 12-12-16 21:12:06, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > [...]
> > > > > > I think this warn_alloc() is too much noise. When something went
> > > > > > wrong, multiple instances of Thread-2 tend to call warn_alloc()
> > > > > > concurrently. We don't need to report similar memory information.
> > > > > 
> > > > > That is why we have ratelimitting. It is needs a better tunning then
> > > > > just let's do it.
> > > > 
> > > > I think that calling show_mem() once per a series of warn_alloc() threads is
> > > > sufficient. Since the amount of output by dump_stack() and that by show_mem()
> > > > are nearly equals, we can save nearly 50% of output if we manage to avoid
> > > > the same show_mem() calls.
> > > 
> > > I do not mind such an update. Again, that is what we have the
> > > ratelimitting for. The fact that it doesn't throttle properly means that
> > > we should tune its parameters.
> > 
> > What about the following? Does this help?
> 
> I don't think it made much difference.

Because I am an idiot. The condition is wrong.
	if (!should_suppress_show_mem() || __ratelimit(&nopage_rs))
		show_mem(filter);
	
it should read
	if (!should_suppress_show_mem() && __ratelimit(&nopage_rs))
		show_mem(filter);

so there was throttling at all. :/ Sorry about that!
 
> I noticed that one of triggers which cause a lot of
> "** XXX printk messages dropped **" is show_all_locks() added by
> commit b2d4c2edb2e4f89a ("locking/hung_task: Show all locks"). When there are
> a lot of threads being blocked on fs locks, show_all_locks() on each blocked
> thread generates incredible amount of messages periodically. Therefore,
> I temporarily set /proc/sys/kernel/hung_task_timeout_secs to 0 to disable
> hung task warnings for testing this patch.
> 
> http://I-love.SAKURA.ne.jp/tmp/serial-20161213.txt.xz is a console log with
> this patch applied. Due to hung task warnings disabled, amount of messages
> are significantly reduced.
> 
> Uptime > 400 are testcases where the stresser was invoked via "taskset -c 0".
> Since there are some "** XXX printk messages dropped **" messages, I can't
> tell whether the OOM killer was able to make forward progress. But guessing
>  from the result that there is no corresponding "Killed process" line for
> "Out of memory: " line at uptime = 450 and the duration of PID 14622 stalled,
> I think it is OK to say that the system got stuck because the OOM killer was
> not able to make forward progress.

The oom situation certainly didn't get resolved. I would be really
curious whether we can rule out the printk out of the picture, though. I
am still not sure we can rule out some obscure OOM killer bug at this
stage.

What if we lower the loglevel as much as possible to only see KERN_ERR
should be sufficient to see few oom killer messages while suppressing
most of the other noise. Unfortunatelly, even messages with level >
loglevel get stored into the ringbuffer (as I've just learned) so
console_unlock() has to crawl through them just to drop them (Meh) but
at least it doesn't have to go to the serial console drivers and spend
even more time there. An alternative would be to tweak printk to not
even store those messaes. Something like the below

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f7a55e9ff2f7..197f2b9fb703 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1865,6 +1865,15 @@ asmlinkage int vprintk_emit(int facility, int level,
 				lflags |= LOG_CONT;
 			}
 
+			if (suppress_message_printing(kern_level)) {
+				logbuf_cpu = UINT_MAX;
+				raw_spin_unlock(&logbuf_lock);
+				lockdep_on();
+				local_irq_restore(flags);
+				return 0;
+			}
+
+
 			text_len -= 2;
 			text += 2;
 		}

So it would be really great if you could
	1) test with the fixed throttling
	2) loglevel=4 on the kernel command line
	3) try the above with the same loglevel

ideally 1) would be sufficient and that would make the most sense from
the warn_alloc point of view. If this is 2 or 3 then we are hitting a
more generic problem and I would be quite careful to hack it around.
 
> ----------
> [  450.767693] Out of memory: Kill process 14642 (a.out) score 999 or sacrifice child
> [  450.769974] Killed process 14642 (a.out) total-vm:4168kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [  450.776538] oom_reaper: reaped process 14642 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  450.781170] Out of memory: Kill process 14643 (a.out) score 999 or sacrifice child
> [  450.783469] Killed process 14643 (a.out) total-vm:4168kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [  450.787912] oom_reaper: reaped process 14643 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  450.792630] Out of memory: Kill process 14644 (a.out) score 999 or sacrifice child
> [  450.964031] a.out: page allocation stalls for 10014ms, order:0, mode:0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
> [  450.964033] CPU: 0 PID: 14622 Comm: a.out Tainted: G        W       4.9.0+ #99
> (...snipped...)
> [  740.984902] a.out: page allocation stalls for 300003ms, order:0, mode:0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
> [  740.984905] CPU: 0 PID: 14622 Comm: a.out Tainted: G        W       4.9.0+ #99
> ----------
> 
> Although it is fine to make warn_alloc() less verbose, this is not
> a problem which can be avoided by simply reducing printk(). Unless
> we give enough CPU time to the OOM killer and OOM victims, it is
> trivial to lockup the system.

This is simply hard if there are way too many tasks runnable...

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]