Re: [patch] oom: give current access to memory reserves if it has been killed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 31 Mar 2010, Oleg Nesterov wrote:

> On 03/30, David Rientjes wrote:
> >
> > On Tue, 30 Mar 2010, Oleg Nesterov wrote:
> >
> > > Note that __oom_kill_task() does force_sig(SIGKILL) which assumes that
> > > ->sighand != NULL. This is not true if out_of_memory() is called after
> > > current has already passed exit_notify().
> >
> > We have an even bigger problem if current is in the oom killer at
> > exit_notify() since it has already detached its ->mm in exit_mm() :)
> 
> Can't understand... I thought that in theory even kmalloc(1) can trigger
> oom.
> 

__oom_kill_task() cannot be called on a task without an ->mm.

> > > IOW, unless I missed something, it is very easy to hide the process
> > > from oom-kill:
> > >
> > > 	int main()
> > > 	{
> > > 		pthread_create(memory_hog_func);
> > > 		syscall(__NR_exit);
> > > 	}
> > >
> >
> > The check for !p->mm was moved in the -mm tree (and the oom killer was
> > entirely rewritten in that tree, so I encourage you to work off of it
> > instead
> 
> OK, but I guess this !p->mm check is still wrong for the same reason.
> In fact I do not understand why it is needed in select_bad_process()
> right before oom_badness() which checks ->mm too (and this check is
> equally wrong).
> 

It prevents kthreads from being killed.  We already identify tasks that 
are in the exit path with PF_EXITING in select_bad_process() and chosen to 
make the oom killer a no-op when it's not current so it can exit and free 
its memory.  If it is current, then we're ooming in the exit path and we 
need to oom kill it so that it gets access to memory reserves so its no 
longer blocking.

> > so if the oom killer finds an already exiting task,
> > it will become a no-op since it should eventually free memory and avoids a
> > needless oom kill.
> 
> No, afaics, And this reminds that I already complained about this
> PF_EXITING check.
> 
> Once again, p is the group leader. It can be dead (no ->mm, PF_EXITING
> is set) but it can have sub-threads. This means, unless I missed something,
> any user can trivially disable select_bad_process() forever.
> 

The task is in the process of exiting and will do so if its not current, 
otherwise it will get access to memory reserves since we're obviously oom 
in the exit path.  Thus, we'll be freeing that memory soon or recalling 
the oom killer to kill additional tasks once those children have been 
reparented (or one of its children was sacrificed).

> 
> Well. Looks like, -mm has a lot of changes in oom_kill.c. Perhaps it
> would be better to fix these mt bugs first...
> 
> Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children).
> Again, this is not right even if we forget about !child->mm check.
> This list_for_each_entry() can only see the processes forked by the
> main thread.
>

That's the intention.
 
> Likewise, oom_kill_process()->list_for_each_entry() is not right too.
> 

Why?

> Hmm. Why oom_forkbomb_penalty() does thread_group_cputime() under
> task_lock() ? It seems, ->alloc_lock() is only needed for get_mm_rss().
> 

Right, but we need to ensure that the check for !child->mm || child->mm == 
tsk->mm fails before adding in get_mm_rss(child->mm).  It can race and 
detach its mm prior to the dereference.  It would be possible to move the 
thread_group_cputime() out of this critical section, but I felt it was 
better to do filter all tasks with child->mm == tsk->mm first before 
unnecessarily finding the cputime for them.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]