On Wed, Dec 07, 2022 at 03:19:55PM -0800, John Hubbard wrote: > On 12/7/22 12:30, Peter Xu wrote: > > We can take the hugetlb walker lock, here taking vma lock directly. > > > > Reviewed-by: David Hildenbrand <david@xxxxxxxxxx> > > Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > > --- > > fs/userfaultfd.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 07c81ab3fd4d..a602f008dde5 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) > > */ > > vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > { > > - struct mm_struct *mm = vmf->vma->vm_mm; > > + struct vm_area_struct *vma = vmf->vma; > > + struct mm_struct *mm = vma->vm_mm; > > struct userfaultfd_ctx *ctx; > > struct userfaultfd_wait_queue uwq; > > vm_fault_t ret = VM_FAULT_SIGBUS; > > @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > */ > > mmap_assert_locked(mm); > > - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; > > + ctx = vma->vm_userfaultfd_ctx.ctx; > > if (!ctx) > > goto out; > > @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > blocking_state = userfaultfd_get_blocking_state(vmf->flags); > > + /* > > + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need > > + * to be before setting current state. > > + */ > > Looking at this code, I am not able to come up with a reason for why the > vma lock/unlock placement is exactly where it is. It looks quite arbitrary. > > Why not, for example, take and drop the vma lock within > userfaultfd_huge_must_wait()? That makes more sense to me, but I'm not familiar > with userfaultfd so of course I'm missing something. > > But the comment above certainly doesn't supply that something. The part that matters in the comment is "need to be before setting current state". blocking_state = userfaultfd_get_blocking_state(vmf->flags); if (is_vm_hugetlb_page(vma)) hugetlb_vma_lock_read(vma); set_current_state(blocking_state); down_read() can sleep and also modify the task state, we cannot take the lock after that point because otherwise the task state will be messed up. -- Peter Xu