On Wed, 17 Jul 2024 15:24:26 -0700 Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote: > 35e351780fa9 ("fork: defer linking file vma until vma is fully initialized") > switched the ordering of vm_ops->open() and copy_page_range() on fork. This is a > bug for VFIO, because it causes two problems: > > 1. Because open() is called before copy_page_range(), the range can conceivably > have unmapped 'holes' in it. This causes the code underneath untrack_pfn() to > WARN. > > 2. More seriously, open() is trying to guarantee that the entire range is > zapped, so any future accesses in the child will result in the VFIO fault > handler being called. Because we copy_page_range() *after* open() (and > therefore after zapping), this guarantee is violated. > > We can't revert 35e351780fa9, because it fixes a real bug for hugetlbfs. The fix > is also not as simple as just reodering open() and copy_page_range(), as Miaohe > points out in [1]. So, although these patches are kind of large for stable, just > backport this refactoring which completely sidesteps the issue. > > Note that patch 2 is the key one here which fixes the issue. Patch 1 is a > prerequisite required for patch 2 to build / work. This would almost be enough, > but we might see significantly regressed performance. Patch 3 fixes that up, > putting performance back on par with what it was before. > > Note [1] also has a more full discussion justifying taking these backports. > > I proposed the same backport for 6.9 [2], and now for 6.6. 6.6 is the oldest > kernel which needs the change: 35e351780fa9 was reverted for unrelated reasons > in 6.1, and was never backported to 5.15 or earlier. AFAICT 35e351780fa9 was reverted in linux-6.6.y as well, so why isn't this one a 4-part series concluding with a new backport of that commit? I think without that, we don't need these in 6.6 either. Thanks, Alex > > [1]: https://lore.kernel.org/all/20240702042948.2629267-1-leah.rumancik@xxxxxxxxx/T/ > [2]: https://lore.kernel.org/r/20240717213339.1921530-1-axelrasmussen@xxxxxxxxxx > > Alex Williamson (3): > vfio: Create vfio_fs_type with inode per device > vfio/pci: Use unmap_mapping_range() > vfio/pci: Insert full vma on mmap'd MMIO fault > > drivers/vfio/device_cdev.c | 7 + > drivers/vfio/group.c | 7 + > drivers/vfio/pci/vfio_pci_core.c | 271 ++++++++----------------------- > drivers/vfio/vfio_main.c | 44 +++++ > include/linux/vfio.h | 1 + > include/linux/vfio_pci_core.h | 2 - > 6 files changed, 125 insertions(+), 207 deletions(-) > > -- > 2.45.2.993.g49e7a77208-goog >