On 2/26/19 2:36 PM, Andrew Morton wrote: >> ... >> >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, >> nodemask_t *nodes_allowed, > > Please tweak that email client to prevent the wordwraps. Sorry about that. >> + /* >> + * Check for a node specific request. Adjust global count, but >> + * restrict alloc/free to the specified node. >> + */ Better comment might be: /* * Check for a node specific request. * Changing node specific huge page count may require a corresponding * change to the global count. In any case, the passed node mask * (nodes_allowed) will restrict alloc/free to the specified node. */ >> + if (nid != NUMA_NO_NODE) { >> + unsigned long old_count = count; >> + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; >> + /* >> + * If user specified count causes overflow, set to >> + * largest possible value. >> + */ Updated comment: /* * User may have specified a large count value which caused the * above calculation to overflow. In this case, they wanted * to allocate as many huge pages as possible. Set count to * largest possible value to align with their intention. */ >> + if (count < old_count) >> + count = ULONG_MAX; >> + } > > The above two comments explain the code, but do not reveal the > reasoning behind the policy decisions which that code implements. > >> ... >> >> + } else { >> /* >> - * per node hstate attribute: adjust count to global, >> - * but restrict alloc/free to the specified node. >> + * Node specific request, but we could not allocate >> + * node mask. Pass in ALL nodes, and clear nid. >> */ > > Ditto here, somewhat. I was just going to update the comments and send you a new patch, but but your comment got me thinking about this situation. I did not really change the way this code operates. As a reminder, the original code is like: NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY); if (nid == NUMA_NO_NODE) { /* do something */ } else if (nodes_allowed) { /* do something else */ } else { nodes_allowed = &node_states[N_MEMORY]; } So, the only way we get to that final else if if we can not allocate a node mask (kmalloc a few words). Right? I wonder why we should even try to continue in this case. Why not just return right there? The specified count value is either a request to increase number of huge pages or decrease. If we can't allocate a few words, we certainly are not going to find memory to create huge pages. There 'might' be surplus pages which can be converted to permanent pages. But remember this is a 'node specific' request and we can't allocate a mask to pass down to the conversion routines. So, chances are good we would operate on the wrong node. The same goes for a request to 'free' huge pages. Since, we can't allocate a node mask we are likely to free them from the wrong node. Unless my reasoning above is incorrect, I think that final else block in __nr_hugepages_store_common() is wrong. Any additional thoughts? -- Mike Kravetz