On Tue, Feb 03, 2009 at 10:29:20AM +0900, KAMEZAWA Hiroyuki wrote: > On Mon, 2 Feb 2009 23:08:56 +0100 > Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > > Hi Greg! > > > > > Thanks for the pointers, I'll go read the thread and follow up there. > > > > If you also run into this final fix is attached below. Porting to > > mainline is a bit hard because of gup-fast... Perhaps we can use mmu > > notifiers to fix gup-fast... need to think more about it then I'll > > post something. > > > > Please help testing the below on pre-gup-fast kernels, thanks! > > > I commented in FJ-Redhat Path but not forwared from unknown reason ;) > I comment again. > > 1. Why TestSetLockPage() is necessary ? > It seems not necesary. To avoid the VM to remove or add the page from/to swapcache and change page_count/mapcount from under us. This most certainly wasn't the reason of the slowdown (the slowdown were the false positives generated by pagevec pinning) and removing it was more intrusive than I wanted. > 2. This patch doesn't cover HugeTLB. There's no need to change hugetlb with my approach. I'm not touching the cow path, I'm addressing the real source of the problem (i.e. when fork pretends to mark the child pte readonly and pointing to the shared parent page, same as ksm: while the pte wrprotect + tlb flush stops the _cpu_ it can't stop any get_user_pages(write=1) user, hence we need to pre-cow the child page in fork instead of marking the child pte readonly to avoid the parent to lose writes if post-fork the parent cows and the child doesn't cow). > 3. Why "follow_page() successfully finds a page" case only ? > not necessary to insert SetPageGUP() in following path ? > > - handle_mm_fault() > => do_anonymos/swap/wp_page() > or some. No need to change that either, all we need to know are the pages whose count vs mapcount has a discrepancy that could have been caused by get_user_pages. So only follow_page has to set it. More precisely FOLL_GET|FOLL_WRITE is the only path we care about there. > BTW, when you write a patch against upstream, please CC me or linux-mm. > I'll have to add a hook for memory-cgroup. Sure. BTW, despite I didn't reproduce the problem here while leaving the ./dma_thread -a 512 -w 40 workload run half a day, others reported me trouble but it was on a different kernel codebase, but at this time I'm unsure if any remaining trouble is caused by some imperfection in this patch or something else. Test results would be interesting basically. Patch is against rhel-5.2 but should be trivial to apply to anything pre-get_user_pages_fast. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html