On Wed, Jul 19, 2023 at 2:16 PM Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote: > > Thanks for the detailed report Dimitris! I've CCed the MM mailing list and some > folks who work on userfaultfd. Apologies, I should have quoted the original message for the others I added to CC: https://lore.kernel.org/lkml/79375b71-db2e-3e66-346b-254c90d915e2@xxxxxxxxxxxxxxxxx/T/#u > > I took a look at this today, but I haven't quite come up with a solution. > > I thought it might be as easy as changing userfaultfd_release() to set released > *after* taking the lock. But no such luck, the ordering is what it is to deal > with another subtle case: > > > WRITE_ONCE(ctx->released, true); > > if (!mmget_not_zero(mm)) > goto wakeup; > > /* > * Flush page faults out of all CPUs. NOTE: all page faults > * must be retried without returning VM_FAULT_SIGBUS if > * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx > * changes while handle_userfault released the mmap_lock. So > * it's critical that released is set to true (above), before > * taking the mmap_lock for writing. > */ > mmap_write_lock(mm); > > I think perhaps the right thing to do is to have handle_userfault() release > mmap_lock when it returns VM_FAULT_NOPAGE, and to have GUP deal with that > appropriately? But, some investigation is required to be sure that's okay to do > in the other non-GUP ways we can end up in handle_userfault().