On Fri, 15 Nov 2019 00:26:26 +0530 Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > On 11/14/2019 1:52 AM, Alex Williamson wrote: > > On Thu, 14 Nov 2019 01:22:39 +0530 > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > >> On 11/13/2019 4:00 AM, Alex Williamson wrote: > >>> On Tue, 12 Nov 2019 22:33:38 +0530 > >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >>> > >>>> With vIOMMU, during pre-copy phase of migration, while CPUs are still > >>>> running, IO virtual address unmap can happen while device still keeping > >>>> reference of guest pfns. Those pages should be reported as dirty before > >>>> unmap, so that VFIO user space application can copy content of those pages > >>>> from source to destination. > >>>> > >>>> IOCTL defination added here add bitmap pointer, size and flag. If flag > >>> > >>> definition, adds > >>> > >>>> VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set and bitmap memory is allocated > >>>> and bitmap_size of set, then ioctl will create bitmap of pinned pages and > >>> > >>> s/of/is/ > >>> > >>>> then unmap those. > >>>> > >>>> Signed-off-by: Kirti Wankhede <kwankhede@xxxxxxxxxx> > >>>> Reviewed-by: Neo Jia <cjia@xxxxxxxxxx> > >>>> --- > >>>> include/uapi/linux/vfio.h | 33 +++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 33 insertions(+) > >>>> > >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >>>> index 6fd3822aa610..72fd297baf52 100644 > >>>> --- a/include/uapi/linux/vfio.h > >>>> +++ b/include/uapi/linux/vfio.h > >>>> @@ -925,6 +925,39 @@ struct vfio_iommu_type1_dirty_bitmap { > >>>> > >>>> #define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17) > >>>> > >>>> +/** > >>>> + * VFIO_IOMMU_UNMAP_DMA_GET_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 18, > >>>> + * struct vfio_iommu_type1_dma_unmap_bitmap) > >>>> + * > >>>> + * Unmap IO virtual addresses using the provided struct > >>>> + * vfio_iommu_type1_dma_unmap_bitmap. Caller sets argsz. > >>>> + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap > >>>> + * before unmapping IO virtual addresses. If this flag is not set, only IO > >>>> + * virtual address are unmapped without creating pinned pages bitmap, that > >>>> + * is, behave same as VFIO_IOMMU_UNMAP_DMA ioctl. > >>>> + * User should allocate memory to get bitmap and should set size of allocated > >>>> + * memory in bitmap_size field. One bit in bitmap is used to represent per page > >>>> + * consecutively starting from iova offset. Bit set indicates page at that > >>>> + * offset from iova is dirty. > >>>> + * The actual unmapped size is returned in the size field and bitmap of pages > >>>> + * in the range of unmapped size is returned in bitmap if flag > >>>> + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set. > >>>> + * > >>>> + * No guarantee is made to the user that arbitrary unmaps of iova or size > >>>> + * different from those used in the original mapping call will succeed. > >>>> + */ > >>>> +struct vfio_iommu_type1_dma_unmap_bitmap { > >>>> + __u32 argsz; > >>>> + __u32 flags; > >>>> +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0) > >>>> + __u64 iova; /* IO virtual address */ > >>>> + __u64 size; /* Size of mapping (bytes) */ > >>>> + __u64 bitmap_size; /* in bytes */ > >>>> + void __user *bitmap; /* one bit per page */ > >>>> +}; > >>>> + > >>>> +#define VFIO_IOMMU_UNMAP_DMA_GET_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 18) > >>>> + > >>> > >>> Why not extend VFIO_IOMMU_UNMAP_DMA to support this rather than add an > >>> ioctl that duplicates the functionality and extends it?? > >> > >> We do want old userspace applications to work with new kernel and > >> vice-versa, right? > >> > >> If I try to change existing VFIO_IOMMU_UNMAP_DMA ioctl structure, say if > >> add 'bitmap_size' and 'bitmap' after 'size', with below code in old > >> kernel, old kernel & new userspace will work. > >> > >> minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size); > >> > >> if (copy_from_user(&unmap, (void __user *)arg, minsz)) > >> return -EFAULT; > >> > >> if (unmap.argsz < minsz || unmap.flags) > >> return -EINVAL; > >> > >> > >> With new kernel it would change to: > >> minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, bitmap); > > > > No, the minimum structure size still ends at size, we interpret flags > > and argsz to learn if the user understands those fields and optionally > > include them. Therefore old userspace on new kernel continues to work. > > > >> if (copy_from_user(&unmap, (void __user *)arg, minsz)) > >> return -EFAULT; > >> > >> if (unmap.argsz < minsz || unmap.flags) > >> return -EINVAL; > >> > >> Then old userspace app will fail because unmap.argsz < minsz and might > >> be copy_from_user would cause seg fault because userspace sdk doesn't > >> contain new member variables. > >> We can't change the sequence to keep 'size' as last member, because then > >> new userspace app on old kernel will interpret it wrong. > > > > If we have new userspace on old kernel, that userspace needs to be able > > to learn that this feature exists (new flag in the > > vfio_iommu_type1_info struct as suggested below) and only make use of it > > when available. This is why the old kernel checks argsz against minsz. > > So long as the user passes something at least minsz in size, we have > > compatibility. The old kernel doesn't understand the GET_DIRTY_BITMAP > > flag and will return an error if the user attempts to use it. Thanks, > > > > Ok. So then VFIO_IOMMU_UNMAP_DMA_GET_BITMAP ioctl is not needed. I'll do > the change. Again bitmap will be created considering smallest page size > of iova_pgsizes > > But VFIO_IOMMU_GET_DIRTY_BITMAP ioctl will still required, right? Yes, I'm not willing to suggest a flag on an unmap ioctl that eliminates the unmap just so we can re-use it for retrieving a dirty page bitmap. That'd be ugly. Thanks, Alex