Re: [PATCH 6.6] fork: defer linking file vma until vma is fully initialized

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 15, 2024 at 3:21 PM Alex Williamson
<alex.williamson@xxxxxxxxxx> wrote:
>
> On Mon, 15 Jul 2024 13:35:41 -0700
> Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote:
>
> > I tried out Sasha's suggestion. Note that *just* taking
> > aac6db75a9 ("vfio/pci: Use unmap_mapping_range()") is not sufficient, we also
> > need b7c5e64fec ("vfio: Create vfio_fs_type with inode per device").
> >
> > But, the good news is both of those apply more or less cleanly to 6.6. And, at
> > least under a very basic test which exercises VFIO memory mapping, things seem
> > to work properly with that change.
> >
> > I would agree with Leah that these seem a bit big to be stable fixes. But, I'm
> > encouraged by the fact that Sasha suggested taking them. If there are no big
> > objections (Alex? :) ) I can send the backport patches this week.
> >
>
> If you were to take those, I think you'd also want:
>
> d71a989cf5d9 ("vfio/pci: Insert full vma on mmap'd MMIO fault")
>
> which helps avoid a potential regression in VM startup latency vs
> faulting each page of the VMA.  Ideally we'd have had huge_fault
> working for pfnmaps before this conversion to avoid the latter commit.
>
> I'm a bit confused by the lineage here though, 35e351780fa9 ("fork:
> defer linking file vma until vma is fully initialized") entered v6.9
> whereas these vfio changes all came in v6.10, so why does the v6.6
> backport end up with dependencies on these newer commits?  Is there
> something that needs to be fixed in v6.9-stable as well?

Right, I believe 35e351780fa9 introduced a bug for VFIO by calling
vm_ops->open() *before* copy_page_range(). So I think this bug affects
not just 6.6 (to which 35e351780fa9 was stable backported) but also
6.9 as you say.

The reason to bring up all these newer commits is, it's unclear how to
fix the bug. :) We thought we had a simple solution to just reorder
when vm_ops->open() is called, but Miaohe pointed out elsewhere in
this thread an issue with doing that.

Assuming the reordering is unworkable, the only other idea I have for
fixing the bug without the larger refactor is:

1. Mark VFIO VMAs VM_WIPEONFORK so we don't copy_page_range after
vm_ops->open() is called
2. Remove the WARN_ON_ONCE(1) in get_pat_info() so when VFIO zaps a
not-fully-populated range (expected if we never copy_page_range!) we
don't get a warning

There are downsides to this fix. It's kind of abusing VM_WIPEONFORK
for a new purpose. It's removing a warning which may catch other
legitimate problems. And it's diverging stable kernels from upstream
as Sasha points out.

Just backporting the refactors fixes (well, totally avoids) the bug,
and it doesn't require special hackery only for stable kernels.

>
> Aside from the size of aac6db75a9 in particular, I'm not aware of any
> outstanding issues that would otherwise dissuade backport to
> v6.6-stable.  Thanks,
>
> Alex
>





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux