On Mon, Dec 21, 2020 at 07:51:52PM +0000, Nadav Amit wrote: > > On Dec 21, 2020, at 11:28 AM, Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > On Sat, Nov 28, 2020 at 04:45:38PM -0800, Nadav Amit wrote: > >> From: Nadav Amit <namit@xxxxxxxxxx> > >> > >> When userfaultfd copy-ioctl fails since the PTE already exists, an > >> -EEXIST error is returned and the faulting thread is not woken. The > >> current userfaultfd test does not wake the faulting thread in such case. > >> The assumption is presumably that another thread set the PTE through > >> copy/wp ioctl and would wake the faulting thread or that alternatively > >> the fault handler would realize there is no need to "must_wait" and > >> continue. This is not necessarily true. > >> > >> There is an assumption that the "must_wait" tests in handle_userfault() > >> are sufficient to provide definitive answer whether the offending PTE is > >> populated or not. However, userfaultfd_must_wait() test is lockless. > >> Consequently, concurrent calls to ptep_modify_prot_start(), for > >> instance, can clear the PTE and can cause userfaultfd_must_wait() > >> to wrongly assume it is not populated and a wait is needed. > > > > Yes userfaultfd_must_wait() is lockless, however my understanding is that we'll > > enqueue before reading the page table, which seems to me that we'll always get > > notified even the race happens. Should apply to either UFFDIO_WRITEPROTECT or > > UFFDIO_COPY, iiuc, as long as we follow the order of (1) modify pgtable (2) > > wake sleeping threads. Then it also means that when must_wait() returned true, > > it should always get waked up when fault resolved. > > > > Taking UFFDIO_COPY as example, even if UFFDIO_COPY happen right before > > must_wait() calls: > > > > worker thread uffd thread > > ------------- ----------- > > > > handle_userfault > > spin_lock(fault_pending_wqh) > > enqueue() > > set_current_state(INTERRUPTIBLE) > > spin_unlock(fault_pending_wqh) > > must_wait() > > lockless walk page table > > UFFDIO_COPY > > fill in the hole > > wake up threads > > (this will wake up worker thread too?) > > schedule() > > (which may return immediately?) > > > > While here fault_pending_wqh is lock protected. I just feel like there's some > > other reason to cause the thread to stall. Or did I miss something? > > But what happens if the copy completed before the enqueuing? Assume > the page is write-protected during UFFDIO_COPY: > > > cpu0 cpu1 > ---- ---- > handle_userfault > UFFDIO_COPY > [ write-protected ] > fill in the hole > wake up threads > [nothing to wake] > > UFFD_WP (unprotect) > logically marks as unprotected > [nothing to wake] > > spin_lock(fault_pending_wqh) > enqueue() > set_current_state(INTERRUPTIBLE) > spin_unlock(fault_pending_wqh) > must_wait() > > [ #PF on the same PTE > due to write-protection ] > > ... > wp_page_copy() > ptep_clear_flush_notify() > [ PTE is clear ] > > lockless walk page table > pte_none(*pte) -> must wait > > Note that additional scenarios are possible. For instance, instead of > wp_page_copy(), we can have other change_pte_range() (due to worker’s > mprotect() or NUMA balancing), calling ptep_modify_prot_start() and clearing > the PTE. > > Am I missing something? Ah I see your point, thanks. I think you're right: Reviewed-by: Peter Xu <peterx@xxxxxxxxxx> Would you mind adding something like above into the commit message if you're going to repost? IMHO it would even be nicer to mention why UFFDIO_WRITEPROTECT does not need this extra wakeup (I think it's because it'll do the wakeup unconditionally anyway). -- Peter Xu