On Thu, 20 Oct 2016 14:03:49 +1100 Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: > In some situations the userspace memory context may live longer than > the userspace process itself so if we need to do proper memory context > cleanup, we better cache @mm and use it later when the process is gone > (@current or @current->mm is NULL). > > This references mm and stores the pointer in the container; this is done > when a container is just created so checking for !current->mm in other > places becomes pointless. > > This replaces current->mm with container->mm everywhere except debug > prints. > > This adds a check that current->mm is the same as the one stored in > the container to prevent userspace from registering memory in other > processes. > > Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> > --- > drivers/vfio/vfio_iommu_spapr_tce.c | 127 ++++++++++++++++++++---------------- > 1 file changed, 71 insertions(+), 56 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c > index d0c38b2..6b0b121 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -31,49 +31,46 @@ Does it make sense to move the rest of these hunks into patch 2? I think they're similarly just moving the mm reference into callers. > static void tce_iommu_detach_group(void *iommu_data, > struct iommu_group *iommu_group); > > -static long try_increment_locked_vm(long npages) > +static long try_increment_locked_vm(struct mm_struct *mm, long npages) > { > long ret = 0, locked, lock_limit; > > - if (!current || !current->mm) > - return -ESRCH; /* process exited */ > - > if (!npages) > return 0; > > - down_write(¤t->mm->mmap_sem); > - locked = current->mm->locked_vm + npages; > + down_write(&mm->mmap_sem); > + locked = mm->locked_vm + npages; > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > if (locked > lock_limit && !capable(CAP_IPC_LOCK)) > ret = -ENOMEM; > else > - current->mm->locked_vm += npages; > + mm->locked_vm += npages; > > pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid, > npages << PAGE_SHIFT, > - current->mm->locked_vm << PAGE_SHIFT, > + mm->locked_vm << PAGE_SHIFT, > rlimit(RLIMIT_MEMLOCK), > ret ? " - exceeded" : ""); > > - up_write(¤t->mm->mmap_sem); > + up_write(&mm->mmap_sem); > > return ret; > } > > -static void decrement_locked_vm(long npages) > +static void decrement_locked_vm(struct mm_struct *mm, long npages) > { > - if (!current || !current->mm || !npages) > + if (!mm || !npages) > return; /* process exited */ I know you're trying to be defensive and change as little logic as possible, but some cases should be an error, and I think some of the "process exited" comments were wrong anyway. Maybe pull the !mm test into the caller and make it WARN_ON? > @@ -317,6 +311,9 @@ static void *tce_iommu_open(unsigned long arg) > return ERR_PTR(-EINVAL); > } > > + if (!current->mm) > + return ERR_PTR(-ESRCH); /* process exited */ A userspace thread in the kernel can't have its mm disappear, unless you are actually in the exit code. !current->mm is more like a test for a kernel thread. > + > container = kzalloc(sizeof(*container), GFP_KERNEL); > if (!container) > return ERR_PTR(-ENOMEM); > @@ -326,13 +323,17 @@ static void *tce_iommu_open(unsigned long arg) > > container->v2 = arg == VFIO_SPAPR_TCE_v2_IOMMU; > > + container->mm = current->mm; > + atomic_inc(&container->mm->mm_count); > + > return container; It's a nitpick if you respin the patch, but I guess it would better be described as a reference than a cache of the object. "have tce_container take a reference to mm_struct". > @@ -515,13 +526,16 @@ static long tce_iommu_build_v2(struct tce_container *container, > unsigned long hpa; > enum dma_data_direction dirtmp; > > + if (container->mm != current->mm) > + return -ESRCH; Good, is this condition now enforced on all entrypoints that use container->mm (except the final teardown)? (The mlock/rlimit stuff, as we talked about before, doesn't make sense if not). Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html