On Wed, 09 Sep 2009 12:31:52 -0400 Lee Schermerhorn <lee.schermerhorn@xxxxxx> wrote: > This patch derives a "nodes_allowed" node mask from the numa > mempolicy of the task modifying the number of persistent huge > pages to control the allocation, freeing and adjusting of surplus > huge pages. > > ... > > Index: linux-2.6.31-rc7-mmotm-090827-1651/mm/mempolicy.c > =================================================================== > --- linux-2.6.31-rc7-mmotm-090827-1651.orig/mm/mempolicy.c 2009-09-09 11:57:26.000000000 -0400 > +++ linux-2.6.31-rc7-mmotm-090827-1651/mm/mempolicy.c 2009-09-09 11:57:36.000000000 -0400 > @@ -1564,6 +1564,57 @@ struct zonelist *huge_zonelist(struct vm > } > return zl; > } > + > +/* > + * alloc_nodemask_of_mempolicy > + * > + * Returns a [pointer to a] nodelist based on the current task's mempolicy. > + * > + * If the task's mempolicy is "default" [NULL], return NULL for default > + * behavior. Otherwise, extract the policy nodemask for 'bind' > + * or 'interleave' policy or construct a nodemask for 'preferred' or > + * 'local' policy and return a pointer to a kmalloc()ed nodemask_t. > + * > + * N.B., it is the caller's responsibility to free a returned nodemask. > + */ > +nodemask_t *alloc_nodemask_of_mempolicy(void) > +{ > + nodemask_t *nodes_allowed = NULL; > + struct mempolicy *mempolicy; > + int nid; > + > + if (!current->mempolicy) > + return NULL; > + > + mpol_get(current->mempolicy); > + nodes_allowed = kmalloc(sizeof(*nodes_allowed), GFP_KERNEL); Ho hum. I guess a caller which didn't permit GFP_KERNEL would be pretty lame. > + if (!nodes_allowed) > + return NULL; /* silently default */ Missed an mpol_put(). > + nodes_clear(*nodes_allowed); > + mempolicy = current->mempolicy; > + switch (mempolicy->mode) { > + case MPOL_PREFERRED: > + if (mempolicy->flags & MPOL_F_LOCAL) > + nid = numa_node_id(); > + else > + nid = mempolicy->v.preferred_node; > + node_set(nid, *nodes_allowed); > + break; > + > + case MPOL_BIND: > + /* Fall through */ > + case MPOL_INTERLEAVE: > + *nodes_allowed = mempolicy->v.nodes; > + break; > + > + default: > + BUG(); > + } > + > + mpol_put(current->mempolicy); > + return nodes_allowed; > +} Do we actually need the mpol_get()/put here? Can some other process really some in and trash a process's current->mempolicy when that process isn't looking? If so, why the heck isn't the code racy? static inline void mpol_get(struct mempolicy *pol) { if (pol) atomic_inc(&pol->refcnt); } If it's possible for some other task to trash current->mempolicy then that trashing can happen between the `if' and the `atomic_inc', so we're screwed. So either we need some locking here or the mpol_get() isn't needed on current's mempolicy or the mpol_get() has some secret side-effect? Fixlets: --- a/mm/hugetlb.c~hugetlb-derive-huge-pages-nodes-allowed-from-task-mempolicy-fix +++ a/mm/hugetlb.c @@ -1253,7 +1253,7 @@ static unsigned long set_max_huge_pages( nodes_allowed = alloc_nodemask_of_mempolicy(); if (!nodes_allowed) { - printk(KERN_WARNING "%s unable to allocate nodes allowed mask " + printk(KERN_WARNING "%s: unable to allocate nodes allowed mask " "for huge page allocation. Falling back to default.\n", current->comm); nodes_allowed = &node_online_map; --- a/mm/mempolicy.c~hugetlb-derive-huge-pages-nodes-allowed-from-task-mempolicy-fix +++ a/mm/mempolicy.c @@ -1589,7 +1589,7 @@ nodemask_t *alloc_nodemask_of_mempolicy( mpol_get(current->mempolicy); nodes_allowed = kmalloc(sizeof(*nodes_allowed), GFP_KERNEL); if (!nodes_allowed) - return NULL; /* silently default */ + goto out; /* silently default */ nodes_clear(*nodes_allowed); mempolicy = current->mempolicy; @@ -1611,7 +1611,7 @@ nodemask_t *alloc_nodemask_of_mempolicy( default: BUG(); } - +out: mpol_put(current->mempolicy); return nodes_allowed; } -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html