On Thu, 5 Dec 2019 11:49:00 +0530 Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > On 12/5/2019 11:26 AM, Alex Williamson wrote: > > On Thu, 5 Dec 2019 11:12:23 +0530 > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > >> On 12/5/2019 6:58 AM, Yan Zhao wrote: > >>> On Thu, Dec 05, 2019 at 02:34:57AM +0800, Alex Williamson wrote: > >>>> On Wed, 4 Dec 2019 23:40:25 +0530 > >>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >>>> > >>>>> On 12/3/2019 11:34 PM, Alex Williamson wrote: > >>>>>> On Mon, 25 Nov 2019 19:57:39 -0500 > >>>>>> Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > >>>>>> > >>>>>>> On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote: > >>>>>>>> On Fri, 15 Nov 2019 00:26:07 +0530 > >>>>>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>>> On 11/14/2019 1:37 AM, Alex Williamson wrote: > >>>>>>>>>> On Thu, 14 Nov 2019 01:07:21 +0530 > >>>>>>>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >>>>>>>>>> > >>>>>>>>>>> On 11/13/2019 4:00 AM, Alex Williamson wrote: > >>>>>>>>>>>> On Tue, 12 Nov 2019 22:33:37 +0530 > >>>>>>>>>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> All pages pinned by vendor driver through vfio_pin_pages API should be > >>>>>>>>>>>>> considered as dirty during migration. IOMMU container maintains a list of > >>>>>>>>>>>>> all such pinned pages. Added an ioctl defination to get bitmap of such > >>>>>>>>>>>> > >>>>>>>>>>>> definition > >>>>>>>>>>>> > >>>>>>>>>>>>> pinned pages for requested IO virtual address range. > >>>>>>>>>>>> > >>>>>>>>>>>> Additionally, all mapped pages are considered dirty when physically > >>>>>>>>>>>> mapped through to an IOMMU, modulo we discussed devices opting in to > >>>>>>>>>>>> per page pinning to indicate finer granularity with a TBD mechanism to > >>>>>>>>>>>> figure out if any non-opt-in devices remain. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> You mean, in case of device direct assignment (device pass through)? > >>>>>>>>>> > >>>>>>>>>> Yes, or IOMMU backed mdevs. If vfio_dmas in the container are fully > >>>>>>>>>> pinned and mapped, then the correct dirty page set is all mapped pages. > >>>>>>>>>> We discussed using the vpfn list as a mechanism for vendor drivers to > >>>>>>>>>> reduce their migration footprint, but we also discussed that we would > >>>>>>>>>> need a way to determine that all participants in the container have > >>>>>>>>>> explicitly pinned their working pages or else we must consider the > >>>>>>>>>> entire potential working set as dirty. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> How can vendor driver tell this capability to iommu module? Any suggestions? > >>>>>>>> > >>>>>>>> I think it does so by pinning pages. Is it acceptable that if the > >>>>>>>> vendor driver pins any pages, then from that point forward we consider > >>>>>>>> the IOMMU group dirty page scope to be limited to pinned pages? There > >>>>>>> we should also be aware of that dirty page scope is pinned pages + unpinned pages, > >>>>>>> which means ever since a page is pinned, it should be regarded as dirty > >>>>>>> no matter whether it's unpinned later. only after log_sync is called and > >>>>>>> dirty info retrieved, its dirty state should be cleared. > >>>>>> > >>>>>> Yes, good point. We can't just remove a vpfn when a page is unpinned > >>>>>> or else we'd lose information that the page potentially had been > >>>>>> dirtied while it was pinned. Maybe that vpfn needs to move to a dirty > >>>>>> list and both the currently pinned vpfns and the dirty vpfns are walked > >>>>>> on a log_sync. The dirty vpfns list would be cleared after a log_sync. > >>>>>> The container would need to know that dirty tracking is enabled and > >>>>>> only manage the dirty vpfns list when necessary. Thanks, > >>>>>> > >>>>> > >>>>> If page is unpinned, then that page is available in free page pool for > >>>>> others to use, then how can we say that unpinned page has valid data? > >>>>> > >>>>> If suppose, one driver A unpins a page and when driver B of some other > >>>>> device gets that page and he pins it, uses it, and then unpins it, then > >>>>> how can we say that page has valid data for driver A? > >>>>> > >>>>> Can you give one example where unpinned page data is considered reliable > >>>>> and valid? > >>>> > >>>> We can only pin pages that the user has already allocated* and mapped > >>>> through the vfio DMA API. The pinning of the page simply locks the > >>>> page for the vendor driver to access it and unpinning that page only > >>>> indicates that access is complete. Pages are not freed when a vendor > >>>> driver unpins them, they still exist and at this point we're now > >>>> assuming the device dirtied the page while it was pinned. Thanks, > >>>> > >>>> Alex > >>>> > >>>> * An exception here is that the page might be demand allocated and the > >>>> act of pinning the page could actually allocate the backing page for > >>>> the user if they have not faulted the page to trigger that allocation > >>>> previously. That page remains mapped for the user's virtual address > >>>> space even after the unpinning though. > >>>> > >>> > >>> Yes, I can give an example in GVT. > >>> when a gem_object is allocated in guest, before submitting it to guest > >>> vGPU, gfx cmds in its ring buffer need to be pinned into GGTT to get a > >>> global graphics address for hardware access. At that time, we shadow > >>> those cmds and pin pages through vfio pin_pages(), and submit the shadow > >>> gem_object to physial hardware. > >>> After guest driver thinks the submitted gem_object has completed hardware > >>> DMA, it unnpinnd those pinned GGTT graphics memory addresses. Then in > >>> host, we unpin the shadow pages through vfio unpin_pages. > >>> But, at this point, guest driver is still free to access the gem_object > >>> through vCPUs, and guest user space is probably still mapping an object > >>> into the gem_object in guest driver. > >>> So, missing the dirty page tracking for unpinned pages would cause > >>> data inconsitency. > >>> > >> > >> If pages are accessed by guest through vCPUs, then RAM module in QEMU > >> will take care of tracking those pages as dirty. > >> > >> All unpinned pages might not be used, so tracking all unpinned pages > >> during VM or application life time would also lead to tracking lots of > >> stale pages, even though they are not being used. Increasing number of > >> not needed pages could also lead to increasing migration data leading > >> increase in migration downtime. > > > > We can't rely on the vCPU also dirtying a page, the overhead is > > unavoidable. It doesn't matter if the migration is fast if it's > > incorrect. We only need to track unpinned dirty pages while the > > migration is active and the tracking is flushed on each log_sync > > callback. Thanks, > > > > From Yan's comment, pasted below, I thought, need to track all unpinned > pages during application or VM's lifetime. > > > There we should also be aware of that dirty page scope is pinned > pages > + unpinned pages, which means ever since a page is pinned, it > should > > be regarded as dirty no matter whether it's unpinned later. > > But if its about tracking pages which are unpinned "while the migration > is active", then that set would be less, will do this change. When we first start a pre-copy, all RAM (including anything previously pinned) is dirty anyway. I believe the log_sync callback is only intended to report pages dirtied since the migration was started, or since the last log_sync callback. We then assume that currently pinned pages are constantly dirty and previously pinned pages are dirty until we've reported them through log_sync. Thanks, Alex