Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Alex Williamson (alex.williamson@xxxxxxxxxx) wrote:
> On Mon, 23 Mar 2020 23:24:37 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> 
> > On 3/21/2020 12:29 AM, Alex Williamson wrote:
> > > On Sat, 21 Mar 2020 00:12:04 +0530
> > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> > >   
> > >> On 3/20/2020 11:31 PM, Alex Williamson wrote:  
> > >>> On Fri, 20 Mar 2020 23:19:14 +0530
> > >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> > >>>      
> > >>>> On 3/20/2020 4:27 AM, Alex Williamson wrote:  
> > >>>>> On Fri, 20 Mar 2020 01:46:41 +0530
> > >>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> > >>>>>         
> > >>
> > >> <snip>
> > >>  
> > >>>>>> +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova,
> > >>>>>> +				  size_t size, uint64_t pgsize,
> > >>>>>> +				  u64 __user *bitmap)
> > >>>>>> +{
> > >>>>>> +	struct vfio_dma *dma;
> > >>>>>> +	unsigned long pgshift = __ffs(pgsize);
> > >>>>>> +	unsigned int npages, bitmap_size;
> > >>>>>> +
> > >>>>>> +	dma = vfio_find_dma(iommu, iova, 1);
> > >>>>>> +
> > >>>>>> +	if (!dma)
> > >>>>>> +		return -EINVAL;
> > >>>>>> +
> > >>>>>> +	if (dma->iova != iova || dma->size != size)
> > >>>>>> +		return -EINVAL;
> > >>>>>> +
> > >>>>>> +	npages = dma->size >> pgshift;
> > >>>>>> +	bitmap_size = DIRTY_BITMAP_BYTES(npages);
> > >>>>>> +
> > >>>>>> +	/* mark all pages dirty if all pages are pinned and mapped. */
> > >>>>>> +	if (dma->iommu_mapped)
> > >>>>>> +		bitmap_set(dma->bitmap, 0, npages);
> > >>>>>> +
> > >>>>>> +	if (copy_to_user((void __user *)bitmap, dma->bitmap, bitmap_size))
> > >>>>>> +		return -EFAULT;  
> > >>>>>
> > >>>>> We still need to reset the bitmap here, clearing and re-adding the
> > >>>>> pages that are still pinned.
> > >>>>>
> > >>>>> https://lore.kernel.org/kvm/20200319070635.2ff5db56@xxxxxxx/
> > >>>>>         
> > >>>>
> > >>>> I thought you agreed on my reply to it
> > >>>> https://lore.kernel.org/kvm/31621b70-02a9-2ea5-045f-f72b671fe703@xxxxxxxxxx/
> > >>>>     
> > >>>>    > Why re-populate when there will be no change since
> > >>>>    > vfio_iova_dirty_bitmap() is called holding iommu->lock? If there is any
> > >>>>    > pin request while vfio_iova_dirty_bitmap() is still working, it will
> > >>>>    > wait till iommu->lock is released. Bitmap will be populated when page is
> > >>>>    > pinned.  
> > >>>
> > >>> As coded, dirty bits are only ever set in the bitmap, never cleared.
> > >>> If a page is unpinned between iterations of the user recording the
> > >>> dirty bitmap, it should be marked dirty in the iteration immediately
> > >>> after the unpinning and not marked dirty in the following iteration.
> > >>> That doesn't happen here.  We're reporting cumulative dirty pages since
> > >>> logging was enabled, we need to be reporting dirty pages since the user
> > >>> last retrieved the dirty bitmap.  The bitmap should be cleared and
> > >>> currently pinned pages re-added after copying to the user.  Thanks,
> > >>>      
> > >>
> > >> Does that mean, we have to track every iteration? do we really need that
> > >> tracking?
> > >>
> > >> Generally the flow is:
> > >> - vendor driver pin x pages
> > >> - Enter pre-copy-phase where vCPUs are running - user starts dirty pages
> > >> tracking, then user asks dirty bitmap, x pages reported dirty by
> > >> VFIO_IOMMU_DIRTY_PAGES ioctl with _GET flag
> > >> - In pre-copy phase, vendor driver pins y more pages, now bitmap
> > >> consists of x+y bits set
> > >> - In pre-copy phase, vendor driver unpins z pages, but bitmap is not
> > >> updated, so again bitmap consists of x+y bits set.
> > >> - Enter in stop-and-copy phase, vCPUs are stopped, mdev devices are stopped
> > >> - user asks dirty bitmap - Since here vCPU and mdev devices are stopped,
> > >> pages should not get dirty by guest driver or the physical device.
> > >> Hence, x+y dirty pages would be reported.
> > >>
> > >> I don't think we need to track every iteration of bitmap reporting.  
> > > 
> > > Yes, once a bitmap is read, it's reset.  In your example, after
> > > unpinning z pages the user should still see a bitmap with x+y pages,
> > > but once they've read that bitmap, the next bitmap should be x+y-z.
> > > Userspace can make decisions about when to switch from pre-copy to
> > > stop-and-copy based on convergence, ie. the slope of the line recording
> > > dirty pages per iteration.  The implementation here never allows an
> > > inflection point, dirty pages reported through vfio would always either
> > > be flat or climbing.  There might also be a case that an iommu backed
> > > device could start pinning pages during the course of a migration, how
> > > would the bitmap ever revert from fully populated to only tracking the
> > > pinned pages?  Thanks,
> > >   
> > 
> > At KVM forum we discussed this - if guest driver pins say 1024 pages 
> > before migration starts, during pre-copy phase device can dirty 0 pages 
> > in best case and 1024 pages in worst case. In that case, user will 
> > transfer content of 1024 pages during pre-copy phase and in 
> > stop-and-copy phase also, that will be pages will be copied twice. So we 
> > decided to only get dirty pages bitmap at stop-and-copy phase. If user 
> > is going to get dirty pages in stop-and-copy phase only, then that will 
> > be single iteration.
> > There aren't any devices yet that can track sys memory dirty pages. So 
> > we can go ahead with this patch and support for dirty pages tracking 
> > during pre-copy phase can be added later when there will be consumers of 
> > that functionality.
> 
> So if I understand this right, you're expecting the dirty bitmap to
> accumulate dirty bits, in perpetuity, so that the user can only
> retrieve them once at the end of migration?  But if that's the case,
> the user could simply choose to not retrieve the bitmap until the end
> of migration, the result would be the same.  What we have here is that
> dirty bits are never cleared, regardless of whether the user has seen
> them, which is wrong.  Sorry, we had a lot of discussions at KVM forum,
> I don't recall this specific one 5 months later and maybe we weren't
> considering all aspects.  I see the behavior we have here as incorrect,
> but it also seems relatively trivial to make correct.  I hope the QEMU
> code isn't making us go through all this trouble to report a dirty
> bitmap that gets thrown away because it expects the final one to be
> cumulative since the beginning of dirty logging.  Thanks,

I remember the discussion that we couldn't track the system memory
dirtying with current hardware; so the question then is just to track
what has been pinned and then ideally put that memory off until the end.
(Which is interesting because I don't think we currently have  a way
to delay RAM pages till the end in qemu).

[I still worry whether migration will be usable with any
significant amount of system ram that's pinned in this way; the
downside will very easily get above the threshold that people like]

Dave

> Alex
--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux