RE: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Alex Williamson
> Sent: Friday, November 15, 2019 11:22 AM
> 
> On Thu, 14 Nov 2019 21:40:35 -0500
> Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> 
> > On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote:
> > > On Fri, 15 Nov 2019 00:26:07 +0530
> > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> > >
> > > > On 11/14/2019 1:37 AM, Alex Williamson wrote:
> > > > > On Thu, 14 Nov 2019 01:07:21 +0530
> > > > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> > > > >
> > > > >> On 11/13/2019 4:00 AM, Alex Williamson wrote:
> > > > >>> On Tue, 12 Nov 2019 22:33:37 +0530
> > > > >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> > > > >>>
> > > > >>>> All pages pinned by vendor driver through vfio_pin_pages API
> should be
> > > > >>>> considered as dirty during migration. IOMMU container
> maintains a list of
> > > > >>>> all such pinned pages. Added an ioctl defination to get bitmap of
> such
> > > > >>>
> > > > >>> definition
> > > > >>>
> > > > >>>> pinned pages for requested IO virtual address range.
> > > > >>>
> > > > >>> Additionally, all mapped pages are considered dirty when
> physically
> > > > >>> mapped through to an IOMMU, modulo we discussed devices
> opting in to
> > > > >>> per page pinning to indicate finer granularity with a TBD
> mechanism to
> > > > >>> figure out if any non-opt-in devices remain.
> > > > >>>
> > > > >>
> > > > >> You mean, in case of device direct assignment (device pass
> through)?
> > > > >
> > > > > Yes, or IOMMU backed mdevs.  If vfio_dmas in the container are
> fully
> > > > > pinned and mapped, then the correct dirty page set is all mapped
> pages.
> > > > > We discussed using the vpfn list as a mechanism for vendor drivers
> to
> > > > > reduce their migration footprint, but we also discussed that we
> would
> > > > > need a way to determine that all participants in the container have
> > > > > explicitly pinned their working pages or else we must consider the
> > > > > entire potential working set as dirty.
> > > > >
> > > >
> > > > How can vendor driver tell this capability to iommu module? Any
> suggestions?
> > >
> > > I think it does so by pinning pages.  Is it acceptable that if the
> > > vendor driver pins any pages, then from that point forward we consider
> > > the IOMMU group dirty page scope to be limited to pinned pages?
> There
> > > are complications around non-singleton IOMMU groups, but I think
> we're
> > > already leaning towards that being a non-worthwhile problem to solve.
> > > So if we require that only singleton IOMMU groups can pin pages and
> we
> > > pass the IOMMU group as a parameter to
> > > vfio_iommu_driver_ops.pin_pages(), then the type1 backend can set a
> > > flag on its local vfio_group struct to indicate dirty page scope is
> > > limited to pinned pages.  We might want to keep a flag on the
> > > vfio_iommu struct to indicate if all of the vfio_groups for each
> > > vfio_domain in the vfio_iommu.domain_list dirty page scope limited to
> > > pinned pages as an optimization to avoid walking lists too often.  Then
> > > we could test if vfio_iommu.domain_list is not empty and this new flag
> > > does not limit the dirty page scope, then everything within each
> > > vfio_dma is considered dirty.
> > >
> >
> > hi Alex
> > could you help clarify whether my understandings below are right?
> > In future,
> > 1. for mdev and for passthrough device withoug hardware ability to track
> > dirty pages, the vendor driver has to explicitly call
> > vfio_pin_pages()/vfio_unpin_pages() + a flag to tell vfio its dirty page set.
> 
> For non-IOMMU backed mdevs without hardware dirty page tracking,
> there's no change to the vendor driver currently.  Pages pinned by the
> vendor driver are marked as dirty.

What about the vendor driver can figure out, in software means, which
pinned pages are actually dirty? In that case, would a separate mark_dirty
interface make more sense? Or introduce read/write flag to the pin_pages
interface similar to DMA API? Existing drivers always set both r/w flags but
just in case then a specific driver may set read-only or write-only...

> 
> For any IOMMU backed device, mdev or direct assignment, all mapped
> memory would be considered dirty unless there are explicit calls to pin
> pages on top of the IOMMU page pinning and mapping.  These would likely
> be enabled only when the device is in the _SAVING device_state.
> 
> > 2. for those devices with hardware ability to track dirty pages, will still
> > provide a callback to vendor driver to get dirty pages. (as for those
> devices,
> > it is hard to explicitly call vfio_pin_pages()/vfio_unpin_pages())
> >
> > 3. for devices relying on dirty bit info in physical IOMMU, there
> > will be a callback to physical IOMMU driver to get dirty page set from
> > vfio.
> 
> The proposal here does not cover exactly how these would be
> implemented, it only establishes the container as the point of user
> interaction with the dirty bitmap and hopefully allows us to maintain
> that interface regardless of whether we have dirty tracking at the
> device or the system IOMMU.  Ideally devices with dirty tracking would
> make use of page pinning and we'd extend the interface to allow vendor
> drivers the ability to indicate the clean/dirty state of those pinned

I don't think "dirty tracking" == "page pinning". It's possible that a device
support tracking/logging dirty page info into a driver-registered buffer, 
then the host vendor driver doesn't need to mediate fast-path operations. 
In such case, the entire guest memory is always pinned and we just need 
a log-sync like interface for vendor driver to fill dirty bitmap.

> pages.  For system IOMMU dirty page tracking, that potentially might
> mean that we support IOMMU page faults and the container manages
> those
> faults such that the container is the central record of dirty pages.

IOMMU dirty-bit is not equivalent to IOMMU page fault. The latter
is much more complex which requires support both in IOMMU and in
device. Here similar to above device-dirty-tracking case, we just need a
log-sync interface calling into iommu driver to get dirty info filled for
requested address range.

> Until these interfaces are designed, we can only speculate, but the
> goal is to design a user interface compatible with how those features
> might evolve.  If you identify something that can't work, please raise
> the issue.  Thanks,
> 
> Alex

Here is the desired scheme in my mind. Feel free to correct me. :-)

1. iommu log-buf callback is preferred if underlying IOMMU reports
such capability. The iommu driver walks IOMMU page table to find
dirty pages for requested address range;
2. otherwise vendor driver log-buf callback is preferred if the vendor
driver reports such capability when registering mdev types. The
vendor driver calls device-specific interface to fill dirty info;
3. otherwise pages pined by vfio_pin_pages (with WRITE flag) are
considered dirty. This covers normal mediated devices or using
fast-path mediation for migrating passthrough device;
4. otherwise all mapped pages are considered dirty;

Currently we're working on 1) based on VT-d rev3.0. I know some
vendors implement 2) in their own code base. 3) has real usages 
already. 4) is the fall-back.

Alex, are you willing to have all the interfaces ready in one batch,
or support them based on available usages? I'm fine with either
way, but even just doing 3/4 in this series, I'd prefer to having
above scheme included in the code comment, to give the whole 
picture of all possible situations. :-)

Thanks
Kevin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux