On Wed, Jan 18, 2023 at 4:51 AM Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > Page fault handlers might need to fire MMU notifications while a new > > notifier is being registered. Modify mm_take_all_locks to write-lock all > > VMAs and prevent this race with fault handlers that would hold VMA locks. > > VMAs are locked before i_mmap_rwsem and anon_vma to keep the same > > locking order as in page fault handlers. > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > --- > > mm/mmap.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 30c7d1c5206e..a256deca0bc0 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -3566,6 +3566,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) > > * of mm/rmap.c: > > * - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for > > * hugetlb mapping); > > + * - all vmas marked locked > > The existing comment above says that this is an *ordered* listing of > which locks are taken. > > > * - all i_mmap_rwsem locks; > > * - all anon_vma->rwseml > > * > > @@ -3591,6 +3592,7 @@ int mm_take_all_locks(struct mm_struct *mm) > > mas_for_each(&mas, vma, ULONG_MAX) { > > if (signal_pending(current)) > > goto out_unlock; > > + vma_write_lock(vma); > > if (vma->vm_file && vma->vm_file->f_mapping && > > is_vm_hugetlb_page(vma)) > > vm_lock_mapping(mm, vma->vm_file->f_mapping); > > Note that multiple VMAs can have the same ->f_mapping, so with this, > the lock ordering between VMA locks and the mapping locks of hugetlb > VMAs is mixed: If you have two adjacent hugetlb VMAs with the same > ->f_mapping, then the following operations happen: > > 1. lock VMA 1 > 2. lock mapping of VMAs 1 and 2 > 3. lock VMA 2 > 4. [second vm_lock_mapping() is a no-op] > > So for VMA 1, we ended up taking the VMA lock first, but for VMA 2, we > took the mapping lock first. > > The existing code has one loop per lock type to ensure that the locks > really are taken in the specified order, even when some of the locks > are associated with multiple VMAs. > > If we don't care about the ordering between these two, maybe that's > fine and you just have to adjust the comment; but it would be clearer > to add a separate loop for the VMA locks. Oh, thanks for pointing out this detail. A separate loop is definitely needed here. Will do that in the next respin. > > > @@ -3677,6 +3679,7 @@ void mm_drop_all_locks(struct mm_struct *mm) > > if (vma->vm_file && vma->vm_file->f_mapping) > > vm_unlock_mapping(vma->vm_file->f_mapping); > > } > > + vma_write_unlock_mm(mm); > > > > mutex_unlock(&mm_all_locks_mutex); > > } > > -- > > 2.39.0 > >