It's complicated and would take some more time for me to be certain, but after looking for half an hour or so this morning, I agree with Mike that such a race is possible. That is, we may back out into the retry path, and drop mmap_lock, and leave a situation where a page is in the cache, but we have !PageUptodate(). hugetlb_mcopy_atomic_pte clearly handles the VM_SHARED case, so I don't see a reason why there can't be another (non-userfaultfd-registered) mapping. If it were faulted at the right time, it seems like such a fault would indeed zero the page, and then the UFFDIO_COPY retry (once it acquired the lock again) would try to reuse it. On Fri, May 14, 2021 at 10:56 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 5/14/21 5:31 AM, Peter Xu wrote: > > Hi, Mike, > > > > On Thu, May 13, 2021 at 09:02:15PM -0700, Mike Kravetz wrote: > > > > [...] > > > >> I am also concerned with the semantics of this approach and what happens > >> when a fault races with the userfaultfd copy. Previously I asked Peter > >> if we could/should use a page found in the cache for the copy. His > >> answer was as follows: > >> > >> AFAICT that's the expected behavior, and it need to be like that so as to avoid > >> silent data corruption (if the page cache existed, it means the page is not > >> "missing" at all, then it does not suite for a UFFDIO_COPY as it's only used > >> for uffd page missing case). > > > > I didn't follow the rest discussion in depth yet... but just to mention that > > the above answer was for the question whether we can "update the page in the > > page cache", rather than "use a page found in the page cache". > > > > I think reuse the page should be fine, however it'll definitely break existing > > user interface (as it'll expect -EEXIST for now - we have kselftest covers > > that), meanwhile I don't see why the -EEXIST bothers a lot: it still tells the > > user that this page was filled in already. Normally it was filled in by > > another UFFDIO_COPY (as we could have multiple uffd service threads) along with > > a valid pte, then this userspace thread can simply skip this message as it > > means the event has been handled by some other servicing thread. > > > > (This also reminded me that there won't be a chance of UFFDIO_COPY race on page > > no page fault at least, since no page fault will always go into the uffd > > missing handling rather than filling in the page cache for a VM_UFFD_MISSING > > vma; while mmap read lock should guarantee VM_UFFD_MISSING be persistent) > > Perhaps I am missing something. > > Since this is a shared mapping, can we not have a 'regular' mapping to > the same range that is uffd registered? And, that regular mappings could > fault and race with the uffd copy code? > > -- > Mike Kravetz