On Thu, Dec 19, 2013 at 05:02:02PM -0800, Andrew Morton wrote: > On Wed, 18 Dec 2013 15:53:59 +0900 Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote: > > > If parallel fault occur, we can fail to allocate a hugepage, > > because many threads dequeue a hugepage to handle a fault of same address. > > This makes reserved pool shortage just for a little while and this cause > > faulting thread who can get hugepages to get a SIGBUS signal. > > > > To solve this problem, we already have a nice solution, that is, > > a hugetlb_instantiation_mutex. This blocks other threads to dive into > > a fault handler. This solve the problem clearly, but it introduce > > performance degradation, because it serialize all fault handling. > > > > Now, I try to remove a hugetlb_instantiation_mutex to get rid of > > performance degradation. > > So the whole point of the patch is to improve performance, but the > changelog doesn't include any performance measurements! > > Please, run some quantitative tests and include a nice summary of the > results in the changelog. > > This is terribly important, because if the performance benefit is > infinitesimally small or negative, the patch goes into the bit bucket ;) Hello, Andrew, Davidlohr. Yes, I should include the peformance measurements. I can measure it on artificial circumstance, but I think that the best is Davidlohr who reported the issue checks the performance improvement. https://lkml.org/lkml/2013/7/12/428 Davidlohr, could you measure it on your testing environment where you originally reported the issue? > > > For achieving it, at first, we should ensure that > > no one get a SIGBUS if there are enough hugepages. > > > > For this purpose, if we fail to allocate a new hugepage when there is > > concurrent user, we return just 0, instead of VM_FAULT_SIGBUS. With this, > > these threads defer to get a SIGBUS signal until there is no > > concurrent user, and so, we can ensure that no one get a SIGBUS if there > > are enough hugepages. > > So if I'm understanding this correctly... if N threads all generate a > fault against the same address, they will all dive in and allocate a > hugepage, will then do an enormous memcpy into that page and will then > attempt to instantiate the page in pagetables. All threads except one > will lose the race and will free the page again! This sounds terribly > inefficient; it would be useful to write a microbenchmark which > triggers this scenario so we can explore the impact. Yes, you understand correctly, I think. I have an idea to prevent this overhead. It is that marking page when it is zeroed and unmarking when it is mapped to page table. If page mapping is failed due to current thread, the zeroed page will keep the marker and later we can determine if it is zeroed or not. If you want to include this functionality in this series, I can do it ;) Please let me know your decision. > I'm wondering if a better solution to all of this would be to make > hugetlb_instantiation_mutex an array of, say, 1024 mutexes and index it > with a hash of the faulting address. That will 99.9% solve the > performance issue which you believe exists without introducing this new > performance issue? Yes, that approach would solve the performance issue. IIRC, you already suggested this idea roughly 6 months ago and it is implemented by Davidlohr. I remembered that there is a race issue on COW case with this approach. See following link for more information. https://lkml.org/lkml/2013/8/7/142 And we need 1-3 patches to prevent other theorectical race issue regardless any approaches. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>