On Sun, Feb 4, 2018 at 7:46 PM, Haozhong Zhang <haozhong.zhang@xxxxxxxxx> wrote: > On 02/04/18 15:05 -0800, Dan Williams wrote: >> Filesystem-DAX is incompatible with 'longterm' page pinning. Without >> page cache indirection a DAX mapping maps filesystem blocks directly. >> This means that the filesystem must not modify a file's block map while >> any page in a mapping is pinned. In order to prevent the situation of >> userspace holding of filesystem operations indefinitely, disallow >> 'longterm' Filesystem-DAX mappings. >> >> RDMA has the same conflict and the plan there is to add a 'with lease' >> mechanism to allow the kernel to notify userspace that the mapping is >> being torn down for block-map maintenance. Perhaps something similar can >> be put in place for vfio. >> >> Note that xfs and ext4 still report: >> >> "DAX enabled. Warning: EXPERIMENTAL, use at your own risk" >> >> ...at mount time, and resolving the dax-dma-vs-truncate problem is one >> of the last hurdles to remove that designation. >> >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxxx> >> Cc: Christoph Hellwig <hch@xxxxxx> >> Cc: kvm@xxxxxxxxxxxxxxx >> Cc: <stable@xxxxxxxxxxxxxxx> >> Reported-by: Haozhong Zhang <haozhong.zhang@xxxxxxxxx> >> Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> >> --- >> drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index e30e29ae4819..45657e2b1ff7 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, >> { >> struct page *page[1]; >> struct vm_area_struct *vma; >> + struct vm_area_struct *vmas[1]; >> int ret; >> >> if (mm == current->mm) { >> - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), >> - page); >> + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), >> + page, vmas); > > vmas is not used subsequently if this branch is taken, so can we use > NULL here? I'd rather go the other way and refactor this a bit further to skip the find_vma_intersection() below since get_user_pages() already does that work.