On Wed, Sep 16, 2020 at 02:46:19PM -0400, Peter Xu wrote: > My understanding is this may only work for the case when the fork()ed child > quitted before we reach here (so we still have mapcount==1 for the > page). Yes > What if not? Then mapcount will be greater than 1, and cow will > still trigger. Is that what we want? That doesn't work today anyhow, so it is fine continuing to be broken. > Another problem is that, aiui, one of the major change previous patch proposed > is to avoid using lock_page() so that we never block in this path. I saw you mention this before, but it looks like the change was to lift some of the atomc_reads out of the lock and avoid the lock if they indicate failure, checking also for page_maybe_dma_pinned() outside the lock just means the rare case of FOLL_PIN we will take the lock again. > Maybe even more complicated, because "correctness" should be even harder > than "best effort reuse" since it can cause data corruption if we didn't do it > right... The only correct way is for the application to avoid write protect on FOLL_PIN pages. The purpose here is to allow applications that hadn't hit "bad luck" and failed to keep working. Another thought is to insert a warning print here as well that the program is working improperly? At least it would give a transition period to evaluate the extent of the problem. We are thinking it is going to be a notable regression. I botched the last version of the patch, here is something a bit better. Does it seem like it could be OK? I know very little about this part of the kernel Thanks, Jason diff --git a/mm/memory.c b/mm/memory.c index 469af373ae76e1..332de777854f8b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2889,6 +2889,24 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf) return ret; } +static bool cow_needed(struct vm_fault *vmf) +{ + int total_map_swapcount; + + if (!reuse_swap_page(vmf->page, &total_map_swapcount)) + return true; + + if (total_map_swapcount == 1) { + /* + * The page is all ours. Move it to our anon_vma so the rmap + * code will not search our parent or siblings. Protected + * against the rmap code by the page lock. + */ + page_move_anon_rmap(vmf->page, vmf->vma); + } + return false; +} + /* * This routine handles present pages, when users try to write * to a shared page. It is done by copying the page to a new address @@ -2942,13 +2960,27 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) struct page *page = vmf->page; /* PageKsm() doesn't necessarily raise the page refcount */ - if (PageKsm(page) || page_count(page) != 1) + if (PageKsm(page)) goto copy; + if (page_count(page) != 1) { + /* + * If the page is DMA pinned we can't rely on the + * above to know if there are other CPU references as + * page_count() will be elevated by the + * pin. Needlessly copying the page will cause the DMA + * pin to break, try harder to avoid that. + */ + if (!page_maybe_dma_pinned(page)) + goto copy; + } + if (!trylock_page(page)) goto copy; if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { - unlock_page(page); - goto copy; + if (cow_needed(vmf)) { + unlock_page(page); + goto copy; + } } /* * Ok, we've got the only map reference, and the only