Re: [PATCH 0/3] oom: TIF_MEMDIE/PF_EXITING fixes

KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> · Sun, 13 Mar 2011 17:53:50 +0900 (JST)

> I've spent much of the week building up to join in, but the more I
> look around, the more I find to say or investigate, and therefore
> never quite get to write the mail.  Let this be a placeholder, that I
> probably disagree (in an amicable way!) with all of you, and maybe
> I'll finally manage to collect my thoughts into mail later today.
> 
> I guess my main point will be that TIF_MEMDIE serves a number of
> slightly different, even conflicting, purposes; and one of those
> purposes, which present company seems to ignore repeatedly, is to
> serialize access to final reserves of memory - as a comment by Nick in
> select_bad_process() makes clear. (This is distinct from the
> serialization to avoid OOM-killing rampage.)
>
> We _might_ choose to abandon that, but if so, it should be a decision,
> not an oversight.  So I cannot blindly agree with just setting
> TIF_MEMDIE on more and more tasks, even if they share the same mm.  I
> wonder if use of your find_lock_task_mm() in   select_bad_process()
> might bring together my wish to continue serialization, David's wish
> to avoid stupid panics, and your wish to avoid deadlocks.
> 
> Though any serialization has a risk of deadlock: we probably need to
> weigh up how realistic different cases are.   Which brings me neatly
> to your little pthread_create, ptrace proggy... I dare say you and
> Kosaki and David know exactly what it's doing and why it's a problem,
> but even after repeated skims of the ptrace manpage,  I'll admit to
> not having a clue, nor the inclination to run and then debug it to
> find out the answer.  Please, Oleg, would you mind very much
> explaining it to me? I don't even know if the double pthread_create is
> a vital part of the scheme, or just a typo.  I see it doesn't even
> allocate any special memory, so I assume it leaves a PF_EXITING around
> forever, but I couldn't quite see how (with PF_EXITING being set after
> the tracehook_report_exit).  And I wonder if a similar case can be
> constructed to deadlock the for_each_process version of
> select_bad_process().
> 
> I worry more about someone holding a reference to the mm via /proc (I
> see memory allocations after getting the mm).
> 
> Thanks; until later,
> Hugh

Hi Hugh,

Andrew Vagin showed us actual deadlock case in "[PATCH] mm: check 
zone->all_unreclaimable in all_unreclaimable()" thread. and I'm
now digging it. Unfortunatelly the cause is not single. then, I
couln't explained how exact event was occur and what should we do.

However, I can say, at least, TIF_MEMDIE is one of root cause of the
andrey's deadlock. I could observed TIF_MEMDIE process never die and
all other process continue to wait. then, the system become hang up.

I'm still continue to investigate and I wish I find minimum negative 
impact solution, but I'm not wonder I finally decide to abandon both 
TIF_MEMDIE and boost_dying_task_prio. The hang-up can be reproduced 
too easily.

Unfortunatelly, my country is under slightly rare natural disaster
and I don't think I'm going to concentrate a debug activity awhile.
I'm sorry for the lazy activity.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>