RE: iommufd dirty page logging overview

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Monday, March 21, 2022 9:31 PM
> 
> On Sun, Mar 20, 2022 at 03:34:30AM +0000, Tian, Kevin wrote:
> 
> > Thinking more the real problem is not related to *before* vs. *after*
> > thing. :/ If Qemu itself doesn't maintain a virtual iotlb
> 
> It has to be after because only unmap guarentees that DMA is
> completely stopped.

In concept, yes.

In reality probably no sw-visible difference. A sane driver doing unmap is
expected to stop the source (i.e. the device) use of the unmapped buffer
first and then clear the iommu mapping. Once the former is completed 
the dirty bitmap of a given range won't change before and after the unmap.

In case of a driver bug which fails to stop the device use in the first place,
losing the dirty bits across the unmap doesn't sound a problem as the user
cannot expect a deterministic behavior in such scenario anyway.

But I didn't intend to advocate 'before' as there is no value of doing so
and 'after' is conceptually correct per your explanation.

> 
> qemu must ensure it doesn't change the user VA to GPA mapping between
> unmap and device fetch dirty, or install something else into that
> IOVA.
> 
> Yes the physical PFNs can be shuffled around by the kernel due to the
> lost page pin, but the logical dirty is really attached to qemu's
> process VA (HVA), not the physical PFN.
> 
> It has to do this in all cases regardless of device or not - when it
> unmaps the IOVA it must know what HVA it put there and translate the
> dirties to that bitmap.
> 
> > given guest mappings for dirtied GIOVAs in the unmapped range
> > already disappear at that point thus the path to find GIOVA->GPA->HVA
> > is just broken.
> 
> qemu has to keep track of how IOVAs translate to HVAs - maybe we could
> have the kernel return the HVA during unmap as well, it already stores
> it, but this has some complications..

Qemu has such information. The key, as you said, is that Qemu shouldn't
destroy that information before dirty bitmap is translated.

> 
> Fundamentally from a qemu perspective it is translating everything to
> UVA because UVA is what the live migration machinery uses.
> 
> But this is all qemu problems and doesn't really help inform the
> kernel API..
> 

Yes, and this is the merit of hw nesting and IOMMU dirty bits. Otherwise
Qemu has to pay the burden of maintaining a copy of guest page table
besides the shadow one maintained in the kernel.

Thanks
Kevin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux