> On Dec 21, 2020, at 11:28 AM, Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Sat, Nov 28, 2020 at 04:45:38PM -0800, Nadav Amit wrote: >> From: Nadav Amit <namit@xxxxxxxxxx> >> >> When userfaultfd copy-ioctl fails since the PTE already exists, an >> -EEXIST error is returned and the faulting thread is not woken. The >> current userfaultfd test does not wake the faulting thread in such case. >> The assumption is presumably that another thread set the PTE through >> copy/wp ioctl and would wake the faulting thread or that alternatively >> the fault handler would realize there is no need to "must_wait" and >> continue. This is not necessarily true. >> >> There is an assumption that the "must_wait" tests in handle_userfault() >> are sufficient to provide definitive answer whether the offending PTE is >> populated or not. However, userfaultfd_must_wait() test is lockless. >> Consequently, concurrent calls to ptep_modify_prot_start(), for >> instance, can clear the PTE and can cause userfaultfd_must_wait() >> to wrongly assume it is not populated and a wait is needed. > > Yes userfaultfd_must_wait() is lockless, however my understanding is that we'll > enqueue before reading the page table, which seems to me that we'll always get > notified even the race happens. Should apply to either UFFDIO_WRITEPROTECT or > UFFDIO_COPY, iiuc, as long as we follow the order of (1) modify pgtable (2) > wake sleeping threads. Then it also means that when must_wait() returned true, > it should always get waked up when fault resolved. > > Taking UFFDIO_COPY as example, even if UFFDIO_COPY happen right before > must_wait() calls: > > worker thread uffd thread > ------------- ----------- > > handle_userfault > spin_lock(fault_pending_wqh) > enqueue() > set_current_state(INTERRUPTIBLE) > spin_unlock(fault_pending_wqh) > must_wait() > lockless walk page table > UFFDIO_COPY > fill in the hole > wake up threads > (this will wake up worker thread too?) > schedule() > (which may return immediately?) > > While here fault_pending_wqh is lock protected. I just feel like there's some > other reason to cause the thread to stall. Or did I miss something? But what happens if the copy completed before the enqueuing? Assume the page is write-protected during UFFDIO_COPY: cpu0 cpu1 ---- ---- handle_userfault UFFDIO_COPY [ write-protected ] fill in the hole wake up threads [nothing to wake] UFFD_WP (unprotect) logically marks as unprotected [nothing to wake] spin_lock(fault_pending_wqh) enqueue() set_current_state(INTERRUPTIBLE) spin_unlock(fault_pending_wqh) must_wait() [ #PF on the same PTE due to write-protection ] ... wp_page_copy() ptep_clear_flush_notify() [ PTE is clear ] lockless walk page table pte_none(*pte) -> must wait Note that additional scenarios are possible. For instance, instead of wp_page_copy(), we can have other change_pte_range() (due to worker’s mprotect() or NUMA balancing), calling ptep_modify_prot_start() and clearing the PTE. Am I missing something?