Re: [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate and there is concurrent user

Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> · Tue, 10 Dec 2013 17:32:44 +0900

On Mon, Dec 09, 2013 at 08:36:23AM -0800, Davidlohr Bueso wrote:
> On Mon, 2013-09-30 at 16:47 +0900, Joonsoo Kim wrote:
> > On Mon, Sep 16, 2013 at 10:09:09PM +1000, David Gibson wrote:
> > > > > 
> > > > > > +		*do_dequeue = false;
> > > > > >  		spin_unlock(&hugetlb_lock);
> > > > > >  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
> > > > > >  		if (!page) {
> > > > > 
> > > > > I think the counter also needs to be incremented in the case where we
> > > > > call alloc_buddy_huge_page() from alloc_huge_page().  Even though it's
> > > > > new, it gets added to the hugepage pool at this point and could still
> > > > > be a contended page for the last allocation, unless I'm missing
> > > > > something.
> > > > 
> > > > Your comment has reasonable point to me, but I have a different opinion.
> > > > 
> > > > As I already mentioned, the point is that we want to avoid the race
> > > > which kill the legitimate users of hugepages by out of resources.
> > > > I increase 'h->nr_dequeue_users' when the hugepage allocated by
> > > > administrator is dequeued. It is because what the hugepage I want to
> > > > protect from the race is the one allocated by administrator via
> > > > kernel param or /proc interface. Administrator may already know how many
> > > > hugepages are needed for their application so that he may set nr_hugepage
> > > > to reasonable value. I want to guarantee that these hugepages can be used
> > > > for his application without any race, since he assume that the application
> > > > would work fine with these hugepages.
> > > > 
> > > > To protect hugepages returned from alloc_buddy_huge_page() from the race
> > > > is different for me. Although it will be added to the hugepage pool, this
> > > > doesn't guarantee certain application's success more. If certain
> > > > application's success depends on the race of this new hugepage, it's death
> > > > by the race doesn't matter, since nobody assume that it works fine.
> > > 
> > > Hrm.  I still think this path should be included.  Although I'll agree
> > > that failing in this case is less bad.
> > > 
> > > However, it can still lead to a situation where with two processes or
> > > threads, faulting on exactly the same shared page we have one succeed
> > > and the other fail.  That's a strange behaviour and I think we want to
> > > avoid it in this case too.
> > 
> > Hello, David.
> > 
> > I don't think it is a strange behaviour. Similar situation can occur
> > even though we use the mutex. Hugepage allocation can be failed when
> > the first process try to allocate the hugepage while second process is blocked
> > by the mutex. And then, second process will go into the fault handler. And
> > at this time, it can succeed. So result is that we have one succeed and
> > the other fail.
> > 
> > It is slightly different from the case you mentioned, but I think that
> > effect for user is same. We cannot avoid this kind of race completely and
> > I think that avoiding the race for administrator managed hugepage pool is
> > good enough to use.
> 
> What was the final decision on this issue? Is Joonsoo's approach to
> removing this mutex viable, or are we stuck with it?

Hello.

After rebasing on current kernel, I will repost it soon.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>