Re: [RFC PATCH 12/21] KVM: IOMMUFD: MEMFD: Map private pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 25, 2024 at 10:44:12AM +0200, Vishal Annapurve wrote:
> On Tue, Sep 24, 2024 at 2:07 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >
> > On Mon, Sep 23, 2024 at 11:52:19PM +0000, Tian, Kevin wrote:
> > > > IMHO we should try to do as best we can here, and the ideal interface
> > > > would be a notifier to switch the shared/private pages in some portion
> > > > of the guestmemfd. With the idea that iommufd could perhaps do it
> > > > atomically.
> > >
> > > yes atomic replacement is necessary here, as there might be in-fly
> > > DMAs to pages adjacent to the one being converted in the same
> > > 1G hunk. Unmap/remap could potentially break it.
> >
> > Yeah.. This integration is going to be much more complicated than I
> > originally thought about. It will need the generic pt stuff as the
> > hitless page table manipulations we are contemplating here are pretty
> > complex.
> >
> > Jason
> 
>  To ensure that I understand your concern properly, the complexity of
> handling hitless page manipulations is because guests can convert
> memory at smaller granularity than the physical page size used by the
> host software.

Yes

You want to, say, break up a 1G private page into 2M chunks and then
hitlessly replace a 2M chunk with a shared one. Unlike the MM side you
don't really want to just non-present the whole thing and fault it
back in. So it is more complex.

We already plan to build the 1G -> 2M transformation for dirty
tracking, the atomic replace will be a further operation.

In the short term you could experiment on this using unmap/remap, but
that isn't really going to work well as a solution. You really can't
unmap an entire 1G page just to poke a 2M hole into it without
disrupting the guest DMA.

Fortunately the work needed to resolve this is well in progress, I had
not realized there was a guest memfd connection, but this is good to
know. It means more people will be intersted in helping :) :)

> Complexity remains the same irrespective of whether kvm/guest_memfd
> is notifying iommu driver to unmap converted ranges or if its
> userspace notifying iommu driver.

You don't want to use the verb 'unmap'.

What you want is a verb more like 'refresh' which can only make sense
in the kernel. 'refresh' would cause the iommu copy of the physical
addresses to update to match the current data in the guestmemfd.

So the private/shared sequence would be like:

1) Guest asks for private -> shared
2) Guestmemfd figures out what the new physicals should be for the
   shared
3) Guestmemfd does 'refresh' on all of its notifiers. This will pick
   up the new shared physical and remove the old private physical from
   the iommus
4) Guestmemfd can be sure nothing in iommu is touching the old memory.

There are some other small considerations that increase complexity,
like AMD needs an IOPTE boundary at any transition between
shared/private. This is a current active bug in the AMD stuff, fixing
it automatically and preserving huge pages via special guestmemfd
support sounds very appealing to me.

Jason




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux