Hi Alex, After days of intensive test, the bug did not show up any more. So I think the new patch does solve this problem. > -----Original Message----- > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > Sent: Friday, April 20, 2018 3:55 AM > To: xuyandong <xuyandong2@xxxxxxxxxx> > Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Zhanghailiang > <zhang.zhanghailiang@xxxxxxxxxx>; wangxin (U) > <wangxinxin.wang@xxxxxxxxxx>; Kirti Wankhede > <kwankhede@xxxxxxxxxx> > Subject: Re: [PATCH] vfio iommu type1: no need to check task->mm if task > has been destroyed > > On Thu, 19 Apr 2018 10:19:26 -0600 > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > [cc +Kirti] > > > > On Wed, 18 Apr 2018 18:55:45 +0800 > > Xu Yandong <xuyandong2@xxxxxxxxxx> wrote: > > > > > The task structure in vfio_dma struct used to identify the same task > > > who map it or other task who shares same adress space is allowed to > > > unmap. But if the task who map it has exited, mm of the task has > > > been set to null, we should unmap the vfio dma directly. > > > > > > Signed-off-by: Xu Yandong <xuyandong2@xxxxxxxxxx> > > > --- > > > Hi all, > > > When I unplug a vcpu from a VM lanched with a VFIO hostdev device, I > > > found that the *vfio_dma* mapped by this vcpu task could not be > > > unmaped in the future, so I send this patch to unmap vfio_dma > > > directly if the task who mapped it has exited. > > > > > > Howerver this patch may introduce a new security risk because any task > can > > > unmap the *vfio_dma* if the mapper task has exited. > > > > Well that's unexpected, but adding some debugging code I can clearly > > see that the map and unmap ioctls are typically called by the various > > processor threads, which all share the same mm_struct (so accounting > > is correct regardless of which CPU does the unmap). I don't think the > > fix below is correct though, it's not for a security risk, but for > > accounting issue and correctness issues. The pages are mapped and > > accounted against the users locked memory limits, if we simply bail > > out, both the IOMMU mappings and the limit accounting are wrong. > > Perhaps rather than referencing the calling task_struct in the > > vfio_dma on mapping, we should traverse to the highest parent task > > sharing the same mm_struct. Kirti, any thoughts since this code > > originated for mdev support? Thanks, > > I think something like below is a better solution. More research required on > group_leader and if it needs to be sanity tested or if testing mm_struct is > redundant, but I think it should resolve the failing test case, all mappings > reference the same task regardless of which vCPU triggers it. Thanks, > > Alex > > diff --git a/drivers/vfio/vfio_iommu_type1.c > b/drivers/vfio/vfio_iommu_type1.c index 5c212bf29640..3a1d3695c3fb > 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1093,6 +1093,7 @@ static int vfio_dma_do_map(struct vfio_iommu > *iommu, > int ret = 0, prot = 0; > uint64_t mask; > struct vfio_dma *dma; > + struct task_struct *task; > > /* Verify that none of our __u64 fields overflow */ > if (map->size != size || map->vaddr != vaddr || map->iova != iova) > @@ -1131,8 +1132,12 @@ static int vfio_dma_do_map(struct vfio_iommu > *iommu, > dma->iova = iova; > dma->vaddr = vaddr; > dma->prot = prot; > - get_task_struct(current); > - dma->task = current; > + > + task = (current->mm == current->group_leader->mm ? > + current->group_leader : current); > + get_task_struct(task); > + dma->task = task; > + > dma->pfn_list = RB_ROOT; > > /* Insert zero-sized and grow as we map chunks of it */ > >