Re: [PATCH] mm,page_alloc: Serialize warn_alloc() if schedulable.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 01-06-17 22:11:13, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Thu 01-06-17 20:43:47, Tetsuo Handa wrote:
> > > Cong Wang has reported a lockup when running LTP memcg_stress test [1].
> >
> > This seems to be on an old and not pristine kernel. Does it happen also
> > on the vanilla up-to-date kernel?
> 
> 4.9 is not an old kernel! It might be close to the kernel version which
> enterprise distributions would choose for their next long term supported
> version.
> 
> And please stop saying "can you reproduce your problem with latest
> linux-next (or at least latest linux)?" Not everybody can use the vanilla
> up-to-date kernel!

The changelog mentioned that the source of stalls is not clear so this
might be out-of-tree patches doing something wrong and dump_stack
showing up just because it is called often. This wouldn't be the first
time I have seen something like that. I am not really keen on adding
heavy lifting for something that is not clearly debugged and based on
hand waving and speculations.

> What I'm pushing via kmallocwd patch is to prepare for overlooked problems
> so that enterprise distributors can collect information and identify what
> changes are needed to be backported.
> 
> As long as you ignore problems not happened with latest linux-next (or
> at least latest linux), enterprise distribution users can do nothing.
> 
> >
> > [...]
> > > Therefore, this patch uses a mutex dedicated for warn_alloc() like
> > > suggested in [3].
> >
> > As I've said previously. We have rate limiting and if that doesn't work
> > out well, let's tune it. The lock should be the last resort to go with.
> > We already throttle show_mem, maybe we can throttle dump_stack as well,
> > although it sounds a bit strange that this adds so much to the picture.
> 
> Ratelimiting never works well. It randomly drops information which is
> useful for debugging. Uncontrolled concurrent dump_stack() causes lockups.
> And restricting dump_stack() drops more information.

As long as the dump_stack can be a source of the stalls, which I am not
so sure about, then we should rate limit it.

> What we should do is to yield CPU time to operations which might do useful
> things (let threads not doing memory allocation; e.g. let printk kernel
> threads to flush pending buffer, let console drivers write the output to
> consoles, let watchdog kernel threads report what is happening).

yes we call that preemptive kernel...

> When memory allocation request is stalling, serialization via waiting
> for a lock does help.

Which will mean that those unlucky ones which stall will stall even more
because they will wait on a lock with potentially many others. While
this certainly is a throttling mechanism it is also a big hammer.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux