Re: [patch 09/18] oom: select task from tasklist for mempolicy ooms

David Rientjes <rientjes@xxxxxxxxxx> · Tue, 8 Jun 2010 17:46:45 -0700 (PDT)

On Tue, 8 Jun 2010, Andrew Morton wrote:

> > The oom killer presently kills current whenever there is no more memory
> > free or reclaimable on its mempolicy's nodes.  There is no guarantee that
> > current is a memory-hogging task or that killing it will free any
> > substantial amount of memory, however.
> > 
> > In such situations, it is better to scan the tasklist for nodes that are
> > allowed to allocate on current's set of nodes and kill the task with the
> > highest badness() score.  This ensures that the most memory-hogging task,
> > or the one configured by the user with /proc/pid/oom_adj, is always
> > selected in such scenarios.
> > 
> >
> > ...
> >
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/module.h>
> >  #include <linux/notifier.h>
> >  #include <linux/memcontrol.h>
> > +#include <linux/mempolicy.h>
> >  #include <linux/security.h>
> >  
> >  int sysctl_panic_on_oom;
> > @@ -36,20 +37,36 @@ static DEFINE_SPINLOCK(zone_scan_lock);
> >  /* #define DEBUG */
> >  
> >  /*
> > - * Is all threads of the target process nodes overlap ours?
> > + * Do all threads of the target process overlap our allowed nodes?
> > + * @tsk: task struct of which task to consider
> > + * @mask: nodemask passed to page allocator for mempolicy ooms
> 
> The comment uses kerneldoc annotation but isn't a kerneldoc comment.
> 

I'll fix it.

> >   */
> > -static int has_intersects_mems_allowed(struct task_struct *tsk)
> > +static bool has_intersects_mems_allowed(struct task_struct *tsk,
> > +					const nodemask_t *mask)
> >  {
> > -	struct task_struct *t;
> > +	struct task_struct *start = tsk;
> >  
> > -	t = tsk;
> >  	do {
> > -		if (cpuset_mems_allowed_intersects(current, t))
> > -			return 1;
> > -		t = next_thread(t);
> > -	} while (t != tsk);
> > -
> > -	return 0;
> > +		if (mask) {
> > +			/*
> > +			 * If this is a mempolicy constrained oom, tsk's
> > +			 * cpuset is irrelevant.  Only return true if its
> > +			 * mempolicy intersects current, otherwise it may be
> > +			 * needlessly killed.
> > +			 */
> > +			if (mempolicy_nodemask_intersects(tsk, mask))
> > +				return true;
> 
> The comment refers to `current' but the code does not?
> 

mempolicy_nodemask_intersects() compares tsk's mempolicy to current's, we 
don't need to pass current into the function (and we optimize for that 
since we don't need to do task_lock(current): nothing else can change its 
mempolicy).

> > +		} else {
> > +			/*
> > +			 * This is not a mempolicy constrained oom, so only
> > +			 * check the mems of tsk's cpuset.
> > +			 */
> 
> The comment doesn't refer to `current', but the code does.  Confused.
> 

This simply compares the cpuset mems_allowed of both tasks passed into the 
function.

> > +			if (cpuset_mems_allowed_intersects(current, tsk))
> > +				return true;
> > +		}
> > +		tsk = next_thread(tsk);
> 
> hm, next_thread() uses list_entry_rcu().  What are the locking rules
> here?  It's one of both of rcu_read_lock() and read_lock(&tasklist_lock),
> I think?
> 

Oleg addressed this in his response.

> > +	} while (tsk != start);
> > +	return false;
> >  }
> 
> This is all bloat and overhead for non-NUMA builds.  I doubt if gcc is
> able to eliminate the task_struct walk (although I didn't check).
> 
> The function isn't oom-killer-specific at all - give it a better name
> then move it to mempolicy.c or similar?  If so, the text "oom"
> shouldn't appear in the comments.
> 

It's the only place where we want to filter tasks based on whether they 
share mempolicy nodes or cpuset mems, though, so I think it's 
appropriately placed in mm/oom_kill.c.  I agree that we can add a
#ifndef CONFIG_NUMA variant and I'll do so, thanks.

> >
> > ...
> >
> > @@ -676,24 +699,19 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> >  	 */
> >  	constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
> >  	read_lock(&tasklist_lock);
> > -
> > -	switch (constraint) {
> > -	case CONSTRAINT_MEMORY_POLICY:
> > -		oom_kill_process(current, gfp_mask, order, 0, NULL,
> > -				"No available memory (MPOL_BIND)");
> > -		break;
> > -
> > -	case CONSTRAINT_NONE:
> > -		if (sysctl_panic_on_oom) {
> > +	if (unlikely(sysctl_panic_on_oom)) {
> > +		/*
> > +		 * panic_on_oom only affects CONSTRAINT_NONE, the kernel
> > +		 * should not panic for cpuset or mempolicy induced memory
> > +		 * failures.
> > +		 */
> 
> This wasn't changelogged?
> 

It's not a functional change, sysctl_panic_on_oom == 2 is already handled 
earlier in the function.  This was intended to elaborate on why we're only 
concerned about CONSTRAINT_NONE here since the switch statement was 
removed.

> > +		if (constraint == CONSTRAINT_NONE) {
> >  			dump_header(NULL, gfp_mask, order, NULL);
> > -			panic("out of memory. panic_on_oom is selected\n");
> > +			read_unlock(&tasklist_lock);
> > +			panic("Out of memory: panic_on_oom is enabled\n");
> >  		}
> > -		/* Fall-through */
> > -	case CONSTRAINT_CPUSET:
> > -		__out_of_memory(gfp_mask, order);
> > -		break;
> >  	}
> > -
> > +	__out_of_memory(gfp_mask, order, constraint, nodemask);
> >  	read_unlock(&tasklist_lock);
> >  
> >  	/*
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>