On Wed, May 22, 2024 at 11:50:06AM -0600, Alex Williamson wrote: > I'm not sure if there are any outstanding blockers on Peter's side, but > this seems like a good route from the vfio side. If we're seeing this > now without lockdep, we might need to bite the bullet and take the hit > with vmf_insert_pfn() while the pmd/pud path learn about pfnmaps. There is another alternative... Ideally we wouldn't use the fault handler. Instead when the MMIO becomes available again we'd iterate over all the VMA's and do remap_pfn_range(). When the MMIO becomes unavailable we do unmap_mapping_range() and remove it. This whole thing is synchronous and the fault handler should simply trigger SIGBUS if userspace races things. unmap_mapping_range() is easy, but the remap side doesn't have a helper today.. Something grotesque like this perhaps? while (1) { struct mm_struct *cur_mm = NULL; i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, mapping->i_mmap, 0, ULONG_MAX) { if (vma_populated(vma)) continue; cur_mm = vm->mm_struct; mmgrab(cur_mm); } i_mmap_unlock_read(mapping); if (!cur_mm) return; mmap_write_lock(cur_mm); i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, mapping->i_mmap, 0, ULONG_MAX) { if (vma->mm_struct == mm) vfio_remap_vma(vma); } i_mmap_unlock_read(mapping); mmap_write_unlock(cur_mm); mmdrop(cur_mm); } I'm pretty sure we need to hold the mmap_write lock to do the remap_pfn.. vma_populated() would have to do a RCU read of the page table to check if the page 0 is present. Also there is a race in mmap if you call remap_pfn_range() from the mmap fops and also use unmap_mapping_range(). mmap_region() does call_mmap() before it does vma_link_file() which gives a window where the VMA is populated but invisible to unmap_mapping_range(). We'd need to add another fop to call after vma_link_file() to populate the mmap or rely exclusively on the fault handler to populate. Jason