On Wed, Jul 19, 2023 at 6:33 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap > lock is held write-locked by the caller, and the caller is responsible for > dropping the mmap lock at a later point (which will also release the VMA > locks). > Calling vma_end_write_all() here is dangerous because the caller might have > write-locked a VMA with the expectation that it will stay write-locked > until the mmap_lock is released, as usual. > > This _almost_ becomes a problem in the following scenario: > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. > Userspace calls munmap() on a range starting at the start address of A and > ending in the middle of B. > > Hypothetical call graph with additional notes in brackets: > > do_vmi_align_munmap > [begin first for_each_vma_range loop] > vma_start_write [on VMA A] > vma_mark_detached [on VMA A] > __split_vma [on VMA B] > sgx_vma_open [== new->vm_ops->open] > sgx_encl_mm_add > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > mm_take_all_locks > mm_drop_all_locks > vma_end_write_all [drops VMA lock taken on VMA A before] > vma_start_write [on VMA B] > vma_mark_detached [on VMA B] > [end first for_each_vma_range loop] > vma_iter_clear_gfp [removes VMAs from maple tree] > mmap_write_downgrade > unmap_region > mmap_read_unlock > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it still > holds a VMA write lock on VMA A, the VMA write lock has actually been > invalidated inside __split_vma(). > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > actually happen here, as far as I understand, because we are duplicating an > existing SGX VMA, but sgx_encl_mm_add() only calls > __mmu_notifier_register() for the first SGX VMA created in a given process. > So this could only happen in fork(), not on munmap(). > But in my view it is just pure luck that this can't happen. > > Also, we wouldn't actually have any bad consequences from this in > do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, > we've already marked VMA A as detached, which makes it completely > ineligible for any VMA-locked page faults. > But again, that's just pure luck. > > So remove the vma_end_write_all(), so that VMA write locks are only ever > released on mmap_write_unlock() or mmap_write_downgrade(). Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, even the ones which were locked before mm_take_all_locks() seems dangerous. One concern I have is that mm_take_all_locks() and mm_drop_all_locks() become asymmetric with this change: mm_take_all_locks() locks all VMAs but mm_drop_all_locks() does not release them. I think there should be an additional comment explaining this asymmetry. Another side-effect which would be nice to document in a comment is that when mm_take_all_locks() fails after it locked the VMAs, those VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. This happens because of failure mm_take_all_locks() jumps to perform mm_drop_all_locks() and this will not unlock already locked VMAs. Other than that LGTM. Thanks! > > Fixes: eeff9a5d47f8 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration") > Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx> Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > --- > mm/mmap.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 3eda23c9ebe7..1ff354b1e23c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3758,7 +3758,6 @@ void mm_drop_all_locks(struct mm_struct *mm) > if (vma->vm_file && vma->vm_file->f_mapping) > vm_unlock_mapping(vma->vm_file->f_mapping); > } > - vma_end_write_all(mm); > > mutex_unlock(&mm_all_locks_mutex); > } > > base-commit: bfa3037d828050896ae52f6467b6ca2489ae6fb1 > -- > 2.41.0.255.g8b1d071c50-goog >