On Wed, 7 Oct 2009, Lee Schermerhorn wrote: > > > Index: linux-2.6.31-mmotm-090925-1435/mm/hugetlb.c > > > =================================================================== > > > --- linux-2.6.31-mmotm-090925-1435.orig/mm/hugetlb.c 2009-09-30 12:48:45.000000000 -0400 > > > +++ linux-2.6.31-mmotm-090925-1435/mm/hugetlb.c 2009-10-02 21:22:04.000000000 -0400 > > > @@ -1334,29 +1334,71 @@ static struct hstate *kobj_to_hstate(str > > > return NULL; > > > } > > > > > > -static ssize_t nr_hugepages_show(struct kobject *kobj, > > > +static ssize_t nr_hugepages_show_common(struct kobject *kobj, > > > struct kobj_attribute *attr, char *buf) > > > { > > > struct hstate *h = kobj_to_hstate(kobj); > > > return sprintf(buf, "%lu\n", h->nr_huge_pages); > > > } > > > -static ssize_t nr_hugepages_store(struct kobject *kobj, > > > - struct kobj_attribute *attr, const char *buf, size_t count) > > > +static ssize_t nr_hugepages_store_common(bool obey_mempolicy, > > > + struct kobject *kobj, struct kobj_attribute *attr, > > > + const char *buf, size_t len) > > > { > > > int err; > > > - unsigned long input; > > > + unsigned long count; > > > struct hstate *h = kobj_to_hstate(kobj); > > > + NODEMASK_ALLOC(nodemask, nodes_allowed); > > > > > > > In the two places you do NODEMASK_ALLOC(), here and > > hugetlb_sysctl_handler(), you'll need to check that nodes_allowed is > > non-NULL since it's possible that kmalloc() will return NULL for > > CONFIG_NODES_SHIFT > 8. > > > > In such a case, it's probably sufficient to simply set nodes_allowed to > > node_states[N_HIGH_MEMORY] so that we can still free hugepages when we're > > oom, a common memory freeing tactic. > > > > You could do that by simply returning false from > > init_nodemask_of_mempolicy() if !nodes_allowed since NODEMASK_FREE() can > > take a NULL pointer, but it may be easier to factor that logic into your > > conditional below: > > > > > - err = strict_strtoul(buf, 10, &input); > > > + err = strict_strtoul(buf, 10, &count); > > > if (err) > > > return 0; > > > > > > - h->max_huge_pages = set_max_huge_pages(h, input, &node_online_map); > > > + if (!(obey_mempolicy && init_nodemask_of_mempolicy(nodes_allowed))) { > > > + NODEMASK_FREE(nodes_allowed); > > > + nodes_allowed = &node_online_map; > > > + } > > > + h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed); > > > > > > > You can get away with just testing !nodes_allowed here since the stack > > allocation variation of NODEMASK_ALLOC() is such that nodes_allowed will > > always be an initialized pointer pointing to _nodes_allowed so you won't > > have an uninitialized warning. > > > > Once that's done, you can get rid of the check for a NULL nodes_allowed in > > try_to_free_low() from patch 2 since it will always be valid in > > set_max_huge_pages(). > > > OK. already removed the NULL check from try_to_free_low(). And I made > the change to init_nodemask_of_mempolicy to return false on NULL mask. > > I'm not completely happy with dropping back to default behavior > [node_online_map here, replaced with node_states[N_HIGH_MEMORY] in > subsequent patch] on failure to allocate nodes_allowed. We only do the > NODEMASK_ALLOC when we've come in from either nr_hugepages_mempolicy or > a per node attribute [subsequent patch], so I'm not sure that ignoring > the mempolicy, if any, or the specified node id, is a good thing here. > Not silently, at least. I haven't addressed this, yet. We can submit > an incremental patch. Thoughts? > Hmm, it's debatable since the NODEMASK_ALLOC() slab allocation is GFP_KERNEL which would cause direct reclaim (and perhaps even the oom killer) to free memory. If the oom killer were invoked, current would probably even be killed because of how the oom killer works for CONSTRAINT_MEMORY_POLICY. So the end result is that the pages would eventually be freed because current would get access to memory reserves via TIF_MEMDIE but would die immediately after returning. It was nice of current to sacrifice itself like that. Unfortunately, I think the long term solution is that NODEMASK_ALLOC() is going to require a gfp parameter to pass to kmalloc() and in this case we should union __GFP_NORETRY. Then, if nodes_allowed can't be allocated I think it would be better to simply return -ENOMEM to userspace so it can either reduce the number of global hugepages or free memory in another way. (There might be a caveat where the user's mempolicy already includes all online nodes and they use nr_hugepages_mempolicy where they couldn't free hugepages because of -ENOMEM but could via nr_hugepages, but I don't think you need to address that.) The worst case allocation is probably 512 bytes for CONFIG_NODES_SHIFT of 12 so I don't think using __GFP_NORETRY here is going to be that ridiculous. -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html