On Tue, Sep 22, 2020 at 01:10:46PM -0300, Jason Gunthorpe wrote: > On Tue, Sep 22, 2020 at 11:17:36AM -0400, Peter Xu wrote: > > > > But it's admittedly a cosmetic point, combined with my perennial fear that > > > I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :) > > > > Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly > > because I think they're cheaper than atomic operations, (which will, iiuc, lock > > the bus). > > It is worth thinking a bit about racing fork with > pin_user_pages(). The desired outcome is: > > If fork wins the page is write protected, and pin_user_pages_fast() > will COW it. > > If pin_user_pages_fast() wins then fork must see the READ_ONCE and > the pin. > > As get_user_pages_fast() is lockless it looks like the ordering has to > be like this: > > pin_user_pages_fast() fork() > atomic_set(has_pinned, 1); > [..] > atomic_add(page->_refcount) > ordered check write protect() > ordered set write protect() > atomic_read(page->_refcount) > atomic_read(has_pinned) > > Such that in all the degenerate racy cases the outcome is that both > sides COW, never neither. > > Thus I think it does have to be atomics purely from an ordering > perspective, observing an increased _refcount requires that has_pinned > != 0 if we are pinning. > > So, to make this 100% this ordering will need to be touched up too. Thanks for spotting this. So something like below should work, right? diff --git a/mm/memory.c b/mm/memory.c index 8f3521be80ca..6591f3f33299 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, * Because we'll need to release the locks before doing cow, * pass this work to upper layer. */ - if (READ_ONCE(src_mm->has_pinned) && wp && - page_maybe_dma_pinned(page)) { + if (wp && page_maybe_dma_pinned(page) && + READ_ONCE(src_mm->has_pinned)) { /* We've got the page already; we're safe */ data->cow_old_page = page; data->cow_oldpte = *src_pte; I can also add some more comment to emphasize this. I think the WRITE_ONCE/READ_ONCE can actually be kept, because atomic ops should contain proper memory barriers already so the memory access orders should be guaranteed (e.g., atomic_add() will have an implicit wmb(); rmb() for the other side). However maybe it's even simpler to change has_pinned into atomic as John suggested. Thanks, -- Peter Xu