On Thu, 2009-09-10 at 16:15 -0700, Andrew Morton wrote: > On Wed, 09 Sep 2009 12:31:52 -0400 > Lee Schermerhorn <lee.schermerhorn@xxxxxx> wrote: > > > This patch derives a "nodes_allowed" node mask from the numa > > mempolicy of the task modifying the number of persistent huge > > pages to control the allocation, freeing and adjusting of surplus > > huge pages. > > > > ... > > > > > Index: linux-2.6.31-rc7-mmotm-090827-1651/mm/mempolicy.c > > =================================================================== > > --- linux-2.6.31-rc7-mmotm-090827-1651.orig/mm/mempolicy.c 2009-09-09 11:57:26.000000000 -0400 > > +++ linux-2.6.31-rc7-mmotm-090827-1651/mm/mempolicy.c 2009-09-09 11:57:36.000000000 -0400 > > @@ -1564,6 +1564,57 @@ struct zonelist *huge_zonelist(struct vm > > } > > return zl; > > } > > + > > +/* > > + * alloc_nodemask_of_mempolicy > > + * > > + * Returns a [pointer to a] nodelist based on the current task's mempolicy. > > + * > > + * If the task's mempolicy is "default" [NULL], return NULL for default > > + * behavior. Otherwise, extract the policy nodemask for 'bind' > > + * or 'interleave' policy or construct a nodemask for 'preferred' or > > + * 'local' policy and return a pointer to a kmalloc()ed nodemask_t. > > + * > > + * N.B., it is the caller's responsibility to free a returned nodemask. > > + */ > > +nodemask_t *alloc_nodemask_of_mempolicy(void) > > +{ > > + nodemask_t *nodes_allowed = NULL; > > + struct mempolicy *mempolicy; > > + int nid; > > + > > + if (!current->mempolicy) > > + return NULL; > > + > > + mpol_get(current->mempolicy); > > + nodes_allowed = kmalloc(sizeof(*nodes_allowed), GFP_KERNEL); > > Ho hum. I guess a caller which didn't permit GFP_KERNEL would be > pretty lame. > > > + if (!nodes_allowed) > > + return NULL; /* silently default */ > > Missed an mpol_put(). Ah, yes. But, see below... > > > + nodes_clear(*nodes_allowed); > > + mempolicy = current->mempolicy; > > + switch (mempolicy->mode) { > > + case MPOL_PREFERRED: > > + if (mempolicy->flags & MPOL_F_LOCAL) > > + nid = numa_node_id(); > > + else > > + nid = mempolicy->v.preferred_node; > > + node_set(nid, *nodes_allowed); > > + break; > > + > > + case MPOL_BIND: > > + /* Fall through */ > > + case MPOL_INTERLEAVE: > > + *nodes_allowed = mempolicy->v.nodes; > > + break; > > + > > + default: > > + BUG(); > > + } > > + > > + mpol_put(current->mempolicy); > > + return nodes_allowed; > > +} > > Do we actually need the mpol_get()/put here? Can some other process > really some in and trash a process's current->mempolicy when that > process isn't looking? You're correct. In this context, I can/will eliminate the get/put. We only really need the reference count in two places: 1) for shared policies [shmem, ...]: one task could be replacing a shared policy while another task is trying to allocate using it's nodemask. 2) for show_numa_maps(), as we're looking at another task's mempolicy [when vma policies default to task mempolicy]. But, here, where the current task is examining its own mempolicy, we don't need the get/put as only the task itself can change it's mempolicy. > > If so, why the heck isn't the code racy? > > static inline void mpol_get(struct mempolicy *pol) > { > if (pol) > atomic_inc(&pol->refcnt); > } > > If it's possible for some other task to trash current->mempolicy then > that trashing can happen between the `if' and the `atomic_inc', so > we're screwed. > > So either we need some locking here or the mpol_get() isn't needed on > current's mempolicy or the mpol_get() has some secret side-effect? Good point. Need to revisit this, altho' not in the context of this series, IMO. May need an atomic_inc_if_nonzero() or such there. > > > Fixlets: > > --- a/mm/hugetlb.c~hugetlb-derive-huge-pages-nodes-allowed-from-task-mempolicy-fix > +++ a/mm/hugetlb.c > @@ -1253,7 +1253,7 @@ static unsigned long set_max_huge_pages( > > nodes_allowed = alloc_nodemask_of_mempolicy(); > if (!nodes_allowed) { > - printk(KERN_WARNING "%s unable to allocate nodes allowed mask " > + printk(KERN_WARNING "%s: unable to allocate nodes allowed mask " > "for huge page allocation. Falling back to default.\n", > current->comm); > nodes_allowed = &node_online_map; > --- a/mm/mempolicy.c~hugetlb-derive-huge-pages-nodes-allowed-from-task-mempolicy-fix > +++ a/mm/mempolicy.c > @@ -1589,7 +1589,7 @@ nodemask_t *alloc_nodemask_of_mempolicy( > mpol_get(current->mempolicy); > nodes_allowed = kmalloc(sizeof(*nodes_allowed), GFP_KERNEL); > if (!nodes_allowed) > - return NULL; /* silently default */ > + goto out; /* silently default */ > > nodes_clear(*nodes_allowed); > mempolicy = current->mempolicy; > @@ -1611,7 +1611,7 @@ nodemask_t *alloc_nodemask_of_mempolicy( > default: > BUG(); > } > - > +out: > mpol_put(current->mempolicy); > return nodes_allowed; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html