On Tue, 3 Feb 2009 03:31:47 +0100 Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > On Tue, Feb 03, 2009 at 10:29:20AM +0900, KAMEZAWA Hiroyuki wrote: > > On Mon, 2 Feb 2009 23:08:56 +0100 > > Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > > > > Hi Greg! > > > > > > > Thanks for the pointers, I'll go read the thread and follow up there. > > > > > > If you also run into this final fix is attached below. Porting to > > > mainline is a bit hard because of gup-fast... Perhaps we can use mmu > > > notifiers to fix gup-fast... need to think more about it then I'll > > > post something. > > > > > > Please help testing the below on pre-gup-fast kernels, thanks! > > > > > I commented in FJ-Redhat Path but not forwared from unknown reason ;) > > I comment again. > > > > 1. Why TestSetLockPage() is necessary ? > > It seems not necesary. > > To avoid the VM to remove or add the page from/to swapcache and change > page_count/mapcount from under us. This most certainly wasn't the > reason of the slowdown (the slowdown were the false positives > generated by pagevec pinning) and removing it was more intrusive than > I wanted. My point is. - If TestSetLockPage() failes, force_cow=1. - If count/mapcount check fails, force_cow=1. So, lock_page() here seems meaningless. If you consider lock_page() is important, just use lock_page() seems better. > > > 2. This patch doesn't cover HugeTLB. > > There's no need to change hugetlb with my approach. I'm not touching > the cow path, I'm addressing the real source of the problem (i.e. when > fork pretends to mark the child pte readonly and pointing to the > shared parent page, same as ksm: while the pte wrprotect + tlb flush > stops the _cpu_ it can't stop any get_user_pages(write=1) user, hence > we need to pre-cow the child page in fork instead of marking the child > pte readonly to avoid the parent to lose writes if post-fork the > parent cows and the child doesn't cow). > No need to make a patch for copy_hugetlb_page_range() ? IMHO, HugeTLB can be write-protected at fork(). > > 3. Why "follow_page() successfully finds a page" case only ? > > not necessary to insert SetPageGUP() in following path ? > > > > - handle_mm_fault() > > => do_anonymos/swap/wp_page() > > or some. > > No need to change that either, all we need to know are the pages whose > count vs mapcount has a discrepancy that could have been caused by > get_user_pages. So only follow_page has to set it. More precisely > FOLL_GET|FOLL_WRITE is the only path we care about there. > Assume 3 threads in a process. == Thread1 (DIO-Read) Thread2 Thread3 get_user_page() => handle_mm_fault(). => map a page with no-write-protect. fork() (write-protect here) Copy-On-Write endio. pre-cow-at-fork will never happen becasue PageGUP is not set. After the end of READ, this process will see a broken page. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html