Re: [PATCH 1/11] hugetlb: rework hstate_next_node_* functions

David Rientjes <rientjes@xxxxxxxxxx> · Tue, 22 Sep 2009 13:13:26 -0700 (PDT)

On Tue, 22 Sep 2009, Lee Schermerhorn wrote:

> > >  static int hstate_next_node_to_alloc(struct hstate *h)
> > >  {
> > > -	int next_nid;
> > > -	next_nid = next_node(h->next_nid_to_alloc, node_online_map);
> > > -	if (next_nid == MAX_NUMNODES)
> > > -		next_nid = first_node(node_online_map);
> > > +	int nid, next_nid;
> > > +
> > > +	nid = h->next_nid_to_alloc;
> > > +	next_nid = next_node_allowed(nid);
> > >  	h->next_nid_to_alloc = next_nid;
> > > -	return next_nid;
> > > +	return nid;
> > >  }
> > >  
> > >  static int alloc_fresh_huge_page(struct hstate *h)
> > 
> > I thought you had refactored this to drop next_nid entirely since gcc 
> > optimizes it away?
> 
> Looks like I handled that in the subsequent patch.  Probably you had
> commented about removing next_nid on that patch.
> 

Ah, I see it in 2/11, thanks.

> > > @@ -693,7 +711,7 @@ static int free_pool_huge_page(struct hs
> > >  	int next_nid;
> > >  	int ret = 0;
> > >  
> > > -	start_nid = h->next_nid_to_free;
> > > +	start_nid = hstate_next_node_to_free(h);
> > >  	next_nid = start_nid;
> > >  
> > >  	do {
> > > @@ -715,9 +733,10 @@ static int free_pool_huge_page(struct hs
> > >  			}
> > >  			update_and_free_page(h, page);
> > >  			ret = 1;
> > > +			break;
> > >  		}
> > >  		next_nid = hstate_next_node_to_free(h);
> > > -	} while (!ret && next_nid != start_nid);
> > > +	} while (next_nid != start_nid);
> > >  
> > >  	return ret;
> > >  }
> > > @@ -1028,10 +1047,9 @@ int __weak alloc_bootmem_huge_page(struc
> > >  		void *addr;
> > >  
> > >  		addr = __alloc_bootmem_node_nopanic(
> > > -				NODE_DATA(h->next_nid_to_alloc),
> > > +				NODE_DATA(hstate_next_node_to_alloc(h)),
> > >  				huge_page_size(h), huge_page_size(h), 0);
> > >  
> > > -		hstate_next_node_to_alloc(h);
> > >  		if (addr) {
> > >  			/*
> > >  			 * Use the beginning of the huge page to store the
> > 
> > Shouldn't that panic if hstate_next_node_to_alloc() returns a memoryless 
> > node since it uses node_online_map?
> 
> Well, the code has always been like this.  And, these allocs shouldn't
> panic given a memoryless node.  The run time ones don't anyway.  If
> '_THISNODE' is specified, it'll just fail with a NULL addr, else it's
> walk the generic zonelist to find the first node that can provide the
> requested page size.  Of course, we don't want that fallback when
> populating the pools with persistent huge pages, so we always use the
> THISNODE flag.
> 

Whether NODE_DATA() exists for a memoryless node is arch-dependent, I 
think, so the panic I was referring to was a NULL pointer in bootmem.  I 
think you're safe with the conversion to N_HIGH_MEMORY in patch 9/11 upon 
further inspection, though.

> Having said that, I've only recently started to [try to] create the
> gigabyte pages on my x86_64 [Shanghai] test system, but haven't been
> able to allocate any GB pages.  2.6.31 seems to hang early in boot with
> the command line options:  "hugepagesz=1GB, hugepages=16".  I've got
> 256GB of memory on this system, so 16GB shouldn't be a problem to find
> at boot time.  Just started looking at this.
> 

I can try to reproduce that on one of my systems too, I've never tried it 
before.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html