> > > Please don't use this email address for me anymore. Either use > > > alexander.duyck@xxxxxxxxx or alexanderduyck@xxxxxx. I am getting > > > bounces when I reply to this thread because of the old address. > > > > No problem. > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > > index eb533995cb49..0fccd5f96954 100644 > > > > --- a/mm/hugetlb.c > > > > +++ b/mm/hugetlb.c > > > > @@ -2320,6 +2320,12 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > > > goto out_uncharge_cgroup_reservation; > > > > > > > > spin_lock(&hugetlb_lock); > > > > + while (h->free_huge_pages <= 1 && h->isolated_huge_pages) { > > > > + spin_unlock(&hugetlb_lock); > > > > + mutex_lock(&h->mtx_prezero); > > > > + mutex_unlock(&h->mtx_prezero); > > > > + spin_lock(&hugetlb_lock); > > > > + } > > > > > > This seems like a bad idea. It kind of defeats the whole point of > > > doing the page zeroing outside of the hugetlb_lock. Also it is > > > operating on the assumption that the only way you might get a page is > > > from the page zeroing logic. > > > > > > With the page reporting code we wouldn't drop the count to zero. We > > > had checks that were going through and monitoring the watermarks and > > > if we started to hit the low watermark we would stop page reporting > > > and just assume there aren't enough pages to report. You might need to > > > look at doing something similar here so that you can avoid colliding > > > with the allocator. > > > > For hugetlb, things are a little different, Just like Mike points out: > > "On some systems, hugetlb pages are a precious resource and > > the sysadmin carefully configures the number needed by > > applications. Removing a hugetlb page (even for a very short > > period of time) could cause serious application failure." > > > > Just keeping some pages in the freelist is not enough to prevent that from > > happening, because these pages may be allocated while zero out is on > > going, and application may still run into a situation for not available free > > pages. > > I get what you are saying. However I don't know if it is acceptable > for the allocating thread to be put to sleep in this situation. There > are two scenarios where I can see this being problematic. > > One is a setup where you put the page allocator to sleep and while it > is sleeping another thread is then freeing a page and your thread > cannot respond to that newly freed page and is stuck waiting on the > zeroed page. > > The second issue is that users may want a different option of just > breaking up the request into smaller pages rather than waiting on the > page zeroing, or to do something else while waiting on the page. So > instead of sitting on the request and waiting it might make more sense > to return an error pointer like EAGAIN or EBUSY to indicate that there > is a page there, but it is momentarily tied up. It seems returning EAGAIN or EBUSY will still change the application's behavior, I am not sure if it's acceptable. Thanks Liang