On Thu, Jul 20, 2023 at 10:14 AM Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Thu, Jul 20, 2023 at 6:52 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Wed, Jul 19, 2023 at 6:33 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > > > Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap > > > lock is held write-locked by the caller, and the caller is responsible for > > > dropping the mmap lock at a later point (which will also release the VMA > > > locks). > > > Calling vma_end_write_all() here is dangerous because the caller might have > > > write-locked a VMA with the expectation that it will stay write-locked > > > until the mmap_lock is released, as usual. > > > > > > This _almost_ becomes a problem in the following scenario: > > > > > > An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. > > > Userspace calls munmap() on a range starting at the start address of A and > > > ending in the middle of B. > > > > > > Hypothetical call graph with additional notes in brackets: > > > > > > do_vmi_align_munmap > > > [begin first for_each_vma_range loop] > > > vma_start_write [on VMA A] > > > vma_mark_detached [on VMA A] > > > __split_vma [on VMA B] > > > sgx_vma_open [== new->vm_ops->open] > > > sgx_encl_mm_add > > > __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] > > > mm_take_all_locks > > > mm_drop_all_locks > > > vma_end_write_all [drops VMA lock taken on VMA A before] > > > vma_start_write [on VMA B] > > > vma_mark_detached [on VMA B] > > > [end first for_each_vma_range loop] > > > vma_iter_clear_gfp [removes VMAs from maple tree] > > > mmap_write_downgrade > > > unmap_region > > > mmap_read_unlock > > > > > > In this hypothetical scenario, while do_vmi_align_munmap() thinks it still > > > holds a VMA write lock on VMA A, the VMA write lock has actually been > > > invalidated inside __split_vma(). > > > > > > The call from sgx_encl_mm_add() to __mmu_notifier_register() can't > > > actually happen here, as far as I understand, because we are duplicating an > > > existing SGX VMA, but sgx_encl_mm_add() only calls > > > __mmu_notifier_register() for the first SGX VMA created in a given process. > > > So this could only happen in fork(), not on munmap(). > > > But in my view it is just pure luck that this can't happen. > > > > > > Also, we wouldn't actually have any bad consequences from this in > > > do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, > > > we've already marked VMA A as detached, which makes it completely > > > ineligible for any VMA-locked page faults. > > > But again, that's just pure luck. > > > > > > So remove the vma_end_write_all(), so that VMA write locks are only ever > > > released on mmap_write_unlock() or mmap_write_downgrade(). > > > > Your logic makes sense to be. mm_drop_all_locks() unlocking all VMAs, > > even the ones which were locked before mm_take_all_locks() seems > > dangerous. > > One concern I have is that mm_take_all_locks() and mm_drop_all_locks() > > become asymmetric with this change: mm_take_all_locks() locks all VMAs > > but mm_drop_all_locks() does not release them. I think there should be > > an additional comment explaining this asymmetry. > > Another side-effect which would be nice to document in a comment is > > that when mm_take_all_locks() fails after it locked the VMAs, those > > VMAs will stay locked until mmap_write_unlock/mmap_write_downgrade. > > This happens because of failure mm_take_all_locks() jumps to perform > > mm_drop_all_locks() and this will not unlock already locked VMAs. > > Other than that LGTM. Thanks! > > But this is not specific to mm_drop_all_locks() at all, right? It's just > fundamentally how per-VMA locks are used everywhere. Somewhere deep > down in some call path, while the mmap lock is held in write mode, a > VMA is marked as being written to, and then this marking persists > until the mmap lock is dropped. Yes, but for all other locks mm_take_all_locks()/mm_drop_all_locks() is perfectly symmetrical AFAIKT. I see pretty descriptive comment for mm_take_all_locks() explaining locking rules, so mentioning this asymmetry in there seems appropriate to me. > > If we want to clarify this, I guess some comments on > vma_end_write_all() and vma_start_write() might help, but I think > that's independent of this patch. Yes and no. This patch changes vma_end_write_all() to be called exclusively from mmap_write_unlock()/mmap_write_downgrade(), so it introduces the notion that once VMA is write-locked it's never released until mmap_lock is released/downgraded. So, I agree that a comment for vma_start_write() is appropriate but I also think it does belong in this patch. Thanks, Suren.