On Mon, Oct 29, 2012 at 5:18 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote: > On Mon, Oct 29, 2012 at 03:36:38PM -0700, Luigi Semenzato wrote: >> On Mon, Oct 29, 2012 at 12:00 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote: >> > On Mon, 29 Oct 2012, Luigi Semenzato wrote: >> > >> >> I managed to get the stack trace for the process that refuses to die. >> >> I am not sure it's due to the deadlock described in earlier messages. >> >> I will investigate further. >> >> >> >> [96283.704390] chrome x 815ecd20 0 16573 1112 0x00100104 >> >> [96283.704405] c107fe34 00200046 f57ae000 815ecd20 815ecd20 ec0b645a >> >> 0000578f f67cfd20 >> >> [96283.704427] d0a9a9a0 c107fdf8 81037be5 f5bdf1e8 f6021800 00000000 >> >> c107fe04 00200202 >> >> [96283.704449] c107fe0c 00200202 f5bdf1b0 c107fe24 8117ddb1 00200202 >> >> f5bdf1b0 f5bdf1b8 >> >> [96283.704471] Call Trace: >> >> [96283.704484] [<81037be5>] ? queue_work_on+0x2d/0x39 >> >> [96283.704497] [<8117ddb1>] ? put_io_context+0x52/0x6a >> >> [96283.704510] [<813b68f6>] schedule+0x56/0x58 >> >> [96283.704520] [<81028525>] do_exit+0x63e/0x640 >> > >> > Could you find out where this happens to be in the function? If you >> > enable CONFIG_DEBUG_INFO, you should be able to use gdb on vmlinux and >> > find out with l *do_exit+0x63e. >> >> It looks like it's the final call to schedule() in do_exit(): >> >> 0x81028520 <+1593>: call 0x813b68a0 <schedule> >> 0x81028525 <+1598>: ud2a >> >> (gdb) l *do_exit+0x63e >> 0x81028525 is in do_exit >> (/home/semenzato/trunk/src/third_party/kernel/files/kernel/exit.c:1069). >> 1064 >> 1065 /* causes final put_task_struct in finish_task_switch(). */ >> 1066 tsk->state = TASK_DEAD; >> 1067 tsk->flags |= PF_NOFREEZE; /* tell freezer to ignore us */ >> 1068 schedule(); >> 1069 BUG(); >> 1070 /* Avoid "noreturn function does return". */ >> 1071 for (;;) >> 1072 cpu_relax(); /* For when BUG is null */ >> 1073 } >> >> Here's a theory: the thread exits fine, but the next scheduled thread >> tries to allocate memory before or during finish_task_switch(), so the >> dead thread is never cleaned up completely and is still considered >> alive by the OOM killer. > > If next thread tries to allocate memory, he will enter direct reclaim path > and there are some scheduling points in there so exit thread should be > destroyed. :( In your previous mail, you said many processes are stuck at > shrink_slab which already includes cond_resched. I can't see any problem. > Hmm, Could you post entire debug log after you capture sysrq+t several time > when hang happens? Thank you so much for your continued assistance. I have been using preserved memory to get the log, and sysrq+T overflows the buffer (there are a few dozen processes). To get the trace for the process with TIF_MEMDIE set, I had to modify the sysrq+T code so that it prints only that process. To get a full trace of all processes I will have to open the device and attach a debug header, so it will take some time. What are we looking for, though? I see many processes running in shrink_slab(), but they are not "stuck" there, they are just spending a lot of time in there. However, now there is something that worries me more. The trace of the thread with TIF_MEMDIE set shows that it has executed most of do_exit() and appears to be waiting to be reaped. From my reading of the code, this implies that task->exit_state should be non-zero, which means that select_bad_process should have skipped that thread, which means that we cannot be in the deadlock situation, and my experiments are not consistent. I will add better instrumentation and report later. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>