On Mon, 2013-09-30 at 16:47 +0900, Joonsoo Kim wrote: > On Mon, Sep 16, 2013 at 10:09:09PM +1000, David Gibson wrote: > > > > > > > > > + *do_dequeue = false; > > > > > spin_unlock(&hugetlb_lock); > > > > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE); > > > > > if (!page) { > > > > > > > > I think the counter also needs to be incremented in the case where we > > > > call alloc_buddy_huge_page() from alloc_huge_page(). Even though it's > > > > new, it gets added to the hugepage pool at this point and could still > > > > be a contended page for the last allocation, unless I'm missing > > > > something. > > > > > > Your comment has reasonable point to me, but I have a different opinion. > > > > > > As I already mentioned, the point is that we want to avoid the race > > > which kill the legitimate users of hugepages by out of resources. > > > I increase 'h->nr_dequeue_users' when the hugepage allocated by > > > administrator is dequeued. It is because what the hugepage I want to > > > protect from the race is the one allocated by administrator via > > > kernel param or /proc interface. Administrator may already know how many > > > hugepages are needed for their application so that he may set nr_hugepage > > > to reasonable value. I want to guarantee that these hugepages can be used > > > for his application without any race, since he assume that the application > > > would work fine with these hugepages. > > > > > > To protect hugepages returned from alloc_buddy_huge_page() from the race > > > is different for me. Although it will be added to the hugepage pool, this > > > doesn't guarantee certain application's success more. If certain > > > application's success depends on the race of this new hugepage, it's death > > > by the race doesn't matter, since nobody assume that it works fine. > > > > Hrm. I still think this path should be included. Although I'll agree > > that failing in this case is less bad. > > > > However, it can still lead to a situation where with two processes or > > threads, faulting on exactly the same shared page we have one succeed > > and the other fail. That's a strange behaviour and I think we want to > > avoid it in this case too. > > Hello, David. > > I don't think it is a strange behaviour. Similar situation can occur > even though we use the mutex. Hugepage allocation can be failed when > the first process try to allocate the hugepage while second process is blocked > by the mutex. And then, second process will go into the fault handler. And > at this time, it can succeed. So result is that we have one succeed and > the other fail. > > It is slightly different from the case you mentioned, but I think that > effect for user is same. We cannot avoid this kind of race completely and > I think that avoiding the race for administrator managed hugepage pool is > good enough to use. What was the final decision on this issue? Is Joonsoo's approach to removing this mutex viable, or are we stuck with it? Thanks, Davidlohr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>