Re: How to handle TIF_MEMDIE stalls?

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Sat, 14 Mar 2015 22:03:59 +0900

Oleg Nesterov wrote:
> On 02/11, Oleg Nesterov wrote:
> >
> > On 02/11, Tetsuo Handa wrote:
> > >
> > > (Asking Oleg this time.)
> >
> > Well, sorry, I ignored the previous discussion, not sure I understand you
> > correctly.
> >
> > > > Though, more serious behavior with this reproducer is (B) where the system
> > > > stalls forever without kernel messages being saved to /var/log/messages .
> > > > out_of_memory() does not select victims until the coredump to pipe can make
> > > > progress whereas the coredump to pipe can't make progress until memory
> > > > allocation succeeds or fails.
> > >
> > > This behavior is related to commit d003f371b2701635 ("oom: don't assume
> > > that a coredumping thread will exit soon"). That commit tried to take
> > > SIGNAL_GROUP_COREDUMP into account, but actually it is failing to do so.
> >
> > Heh. Please see the changelog. This "fix" is obviously very limited, it does
> > not even try to solve all problems (even with coredump in particular).
> >
> > Note also that SIGNAL_GROUP_COREDUMP is not even set if the process (not a
> > sub-thread) shares the memory with the coredumping task. It would be better
> > to check mm->core_state != NULL instead, but this needs the locking. Plus
> > that process likely sleeps in D state in exit_mm(), so this can't help.
> >
> > And that is why we set SIGNAL_GROUP_COREDUMP in zap_threads(), not in
> > zap_process(). We probably want to make that "wait for coredump_finish()"
> > sleep in exit_mm() killable, but this is not simple.
> 
> on a cecond thought, perhaps it makes sense to set SIGNAL_GROUP_COREDUMP
> anyway, even if a CLONE_VM process participating in coredump is not killable.
> I'll recheck tomorrow.

Ping?

> 
> > Sorry for noise if the above is not relevant.
> >
> > Oleg.
> 
> 

I tried https://lkml.org/lkml/2015/3/11/707 with retry_allocation_attempts == 1
(with http://marc.info/?l=linux-mm&m=141671829611143&w=2 for debug printk() ).

Although 0x2015a (which is !__GFP_FS) allocation likely fails within a few
jiffies under TIF_MEMDIE condition, TIF_MEMDIE condition itself cannot be solved
until SIGNAL_GROUP_COREDUMP patch is proposed.

----------
XFS: possible memory allocation deadlock in xfs_buf_allocate_memory (mode:0x250)
XFS: possible memory allocation deadlock in xfs_buf_allocate_memory (mode:0x250)
warn_alloc_failed: 212565 callbacks suppressed
crond: page allocation failure: order:0, mode:0x2015a
rngd: page allocation failure: order:0, mode:0x2015a
CPU: 3 PID: 1667 Comm: rngd Not tainted 4.0.0-rc3+ #37
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
 0000000000000000 00000000ce4cec53 0000000000000000 ffffffff815f30c4
 000000000002015a ffffffff8111063e ffff88007fffdb00 0000000000000000
 0000000000000040 ffff88007c223db0 0000000000000000 00000000ce4cec53
Call Trace:
 [<ffffffff815f30c4>] ? dump_stack+0x40/0x50
 [<ffffffff8111063e>] ? warn_alloc_failed+0xee/0x150
 [<ffffffff81113b03>] ? __alloc_pages_nodemask+0x623/0xa10
 [<ffffffff81150c57>] ? alloc_pages_current+0x87/0x100
 [<ffffffff8110d30d>] ? filemap_fault+0x1bd/0x400
 [<ffffffff812e3dbc>] ? radix_tree_next_chunk+0x5c/0x240
 [<ffffffff8112f85b>] ? __do_fault+0x4b/0xe0
 [<ffffffff81134465>] ? handle_mm_fault+0xc85/0x1640
 [<ffffffff81051c9a>] ? __do_page_fault+0x16a/0x430
 [<ffffffff81051f90>] ? do_page_fault+0x30/0x70
 [<ffffffff815fb03f>] ? error_exit+0x1f/0x60
 [<ffffffff815fae18>] ? page_fault+0x28/0x30
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>