> From: Alex Williamson > Sent: Friday, November 15, 2019 11:22 AM > > On Thu, 14 Nov 2019 21:40:35 -0500 > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > > On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote: > > > On Fri, 15 Nov 2019 00:26:07 +0530 > > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > > > > > On 11/14/2019 1:37 AM, Alex Williamson wrote: > > > > > On Thu, 14 Nov 2019 01:07:21 +0530 > > > > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > > > > > > > >> On 11/13/2019 4:00 AM, Alex Williamson wrote: > > > > >>> On Tue, 12 Nov 2019 22:33:37 +0530 > > > > >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > > >>> > > > > >>>> All pages pinned by vendor driver through vfio_pin_pages API > should be > > > > >>>> considered as dirty during migration. IOMMU container > maintains a list of > > > > >>>> all such pinned pages. Added an ioctl defination to get bitmap of > such > > > > >>> > > > > >>> definition > > > > >>> > > > > >>>> pinned pages for requested IO virtual address range. > > > > >>> > > > > >>> Additionally, all mapped pages are considered dirty when > physically > > > > >>> mapped through to an IOMMU, modulo we discussed devices > opting in to > > > > >>> per page pinning to indicate finer granularity with a TBD > mechanism to > > > > >>> figure out if any non-opt-in devices remain. > > > > >>> > > > > >> > > > > >> You mean, in case of device direct assignment (device pass > through)? > > > > > > > > > > Yes, or IOMMU backed mdevs. If vfio_dmas in the container are > fully > > > > > pinned and mapped, then the correct dirty page set is all mapped > pages. > > > > > We discussed using the vpfn list as a mechanism for vendor drivers > to > > > > > reduce their migration footprint, but we also discussed that we > would > > > > > need a way to determine that all participants in the container have > > > > > explicitly pinned their working pages or else we must consider the > > > > > entire potential working set as dirty. > > > > > > > > > > > > > How can vendor driver tell this capability to iommu module? Any > suggestions? > > > > > > I think it does so by pinning pages. Is it acceptable that if the > > > vendor driver pins any pages, then from that point forward we consider > > > the IOMMU group dirty page scope to be limited to pinned pages? > There > > > are complications around non-singleton IOMMU groups, but I think > we're > > > already leaning towards that being a non-worthwhile problem to solve. > > > So if we require that only singleton IOMMU groups can pin pages and > we > > > pass the IOMMU group as a parameter to > > > vfio_iommu_driver_ops.pin_pages(), then the type1 backend can set a > > > flag on its local vfio_group struct to indicate dirty page scope is > > > limited to pinned pages. We might want to keep a flag on the > > > vfio_iommu struct to indicate if all of the vfio_groups for each > > > vfio_domain in the vfio_iommu.domain_list dirty page scope limited to > > > pinned pages as an optimization to avoid walking lists too often. Then > > > we could test if vfio_iommu.domain_list is not empty and this new flag > > > does not limit the dirty page scope, then everything within each > > > vfio_dma is considered dirty. > > > > > > > hi Alex > > could you help clarify whether my understandings below are right? > > In future, > > 1. for mdev and for passthrough device withoug hardware ability to track > > dirty pages, the vendor driver has to explicitly call > > vfio_pin_pages()/vfio_unpin_pages() + a flag to tell vfio its dirty page set. > > For non-IOMMU backed mdevs without hardware dirty page tracking, > there's no change to the vendor driver currently. Pages pinned by the > vendor driver are marked as dirty. What about the vendor driver can figure out, in software means, which pinned pages are actually dirty? In that case, would a separate mark_dirty interface make more sense? Or introduce read/write flag to the pin_pages interface similar to DMA API? Existing drivers always set both r/w flags but just in case then a specific driver may set read-only or write-only... > > For any IOMMU backed device, mdev or direct assignment, all mapped > memory would be considered dirty unless there are explicit calls to pin > pages on top of the IOMMU page pinning and mapping. These would likely > be enabled only when the device is in the _SAVING device_state. > > > 2. for those devices with hardware ability to track dirty pages, will still > > provide a callback to vendor driver to get dirty pages. (as for those > devices, > > it is hard to explicitly call vfio_pin_pages()/vfio_unpin_pages()) > > > > 3. for devices relying on dirty bit info in physical IOMMU, there > > will be a callback to physical IOMMU driver to get dirty page set from > > vfio. > > The proposal here does not cover exactly how these would be > implemented, it only establishes the container as the point of user > interaction with the dirty bitmap and hopefully allows us to maintain > that interface regardless of whether we have dirty tracking at the > device or the system IOMMU. Ideally devices with dirty tracking would > make use of page pinning and we'd extend the interface to allow vendor > drivers the ability to indicate the clean/dirty state of those pinned I don't think "dirty tracking" == "page pinning". It's possible that a device support tracking/logging dirty page info into a driver-registered buffer, then the host vendor driver doesn't need to mediate fast-path operations. In such case, the entire guest memory is always pinned and we just need a log-sync like interface for vendor driver to fill dirty bitmap. > pages. For system IOMMU dirty page tracking, that potentially might > mean that we support IOMMU page faults and the container manages > those > faults such that the container is the central record of dirty pages. IOMMU dirty-bit is not equivalent to IOMMU page fault. The latter is much more complex which requires support both in IOMMU and in device. Here similar to above device-dirty-tracking case, we just need a log-sync interface calling into iommu driver to get dirty info filled for requested address range. > Until these interfaces are designed, we can only speculate, but the > goal is to design a user interface compatible with how those features > might evolve. If you identify something that can't work, please raise > the issue. Thanks, > > Alex Here is the desired scheme in my mind. Feel free to correct me. :-) 1. iommu log-buf callback is preferred if underlying IOMMU reports such capability. The iommu driver walks IOMMU page table to find dirty pages for requested address range; 2. otherwise vendor driver log-buf callback is preferred if the vendor driver reports such capability when registering mdev types. The vendor driver calls device-specific interface to fill dirty info; 3. otherwise pages pined by vfio_pin_pages (with WRITE flag) are considered dirty. This covers normal mediated devices or using fast-path mediation for migrating passthrough device; 4. otherwise all mapped pages are considered dirty; Currently we're working on 1) based on VT-d rev3.0. I know some vendors implement 2) in their own code base. 3) has real usages already. 4) is the fall-back. Alex, are you willing to have all the interfaces ready in one batch, or support them based on available usages? I'm fine with either way, but even just doing 3/4 in this series, I'd prefer to having above scheme included in the code comment, to give the whole picture of all possible situations. :-) Thanks Kevin