On Tue, Nov 22, 2016 at 06:34:25PM +1100, Alexey Kardashevskiy wrote: > On 22/11/16 14:49, Alexey Kardashevskiy wrote: > > On 22/11/16 13:38, David Gibson wrote: > >> On Thu, Nov 17, 2016 at 06:39:41PM +1100, Alexey Kardashevskiy wrote: > >>> On 11/11/16 23:32, Alexey Kardashevskiy wrote: > >>>> In some situations the userspace memory context may live longer than > >>>> the userspace process itself so if we need to do proper memory context > >>>> cleanup, we better have tce_container take a reference to mm_struct and > >>>> use it later when the process is gone (@current or @current->mm is NULL). > >>>> > >>>> This references mm and stores the pointer in the container; this is done > >>>> in a new helper - tce_iommu_mm_set() - when one of the following happens: > >>>> - a container is enabled (IOMMU v1); > >>>> - a first attempt to pre-register memory is made (IOMMU v2); > >>>> - a DMA window is created (IOMMU v2). > >>>> The @mm stays referenced till the container is destroyed. > >>>> > >>>> This replaces current->mm with container->mm everywhere except debug > >>>> prints. > >>>> > >>>> This adds a check that current->mm is the same as the one stored in > >>>> the container to prevent userspace from making changes to a memory > >>>> context of other processes. > >>>> > >>>> DMA map/unmap ioctls() do not check for @mm as they already check > >>>> for @enabled which is set after tce_iommu_mm_set() is called. > >>>> > >>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> > >>>> --- > >>>> Changes: > >>>> v5: > >>>> * postpone referencing of mm > >>>> > >>>> v4: > >>>> * added check for container->mm!=current->mm in tce_iommu_ioctl() > >>>> for all ioctls and removed other redundand checks > >>>> --- > >>>> drivers/vfio/vfio_iommu_spapr_tce.c | 159 ++++++++++++++++++++++-------------- > >>>> 1 file changed, 99 insertions(+), 60 deletions(-) > >>>> > >>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c > >>>> index 1c02498..9a81a7e 100644 > >>>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c > >>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > >>>> @@ -31,49 +31,49 @@ > >>>> static void tce_iommu_detach_group(void *iommu_data, > >>>> struct iommu_group *iommu_group); > >>>> > >>>> -static long try_increment_locked_vm(long npages) > >>>> +static long try_increment_locked_vm(struct mm_struct *mm, long npages) > >>>> { > >>>> long ret = 0, locked, lock_limit; > >>>> > >>>> - if (!current || !current->mm) > >>>> - return -ESRCH; /* process exited */ > >>>> + if (!mm) > >>>> + return -EPERM; > >>>> > >>>> if (!npages) > >>>> return 0; > >>>> > >>>> - down_write(¤t->mm->mmap_sem); > >>>> - locked = current->mm->locked_vm + npages; > >>>> + down_write(&mm->mmap_sem); > >>>> + locked = mm->locked_vm + npages; > >>>> lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > >>>> if (locked > lock_limit && !capable(CAP_IPC_LOCK)) > >>> > >>> > >>> > >>> Oh boy. Now it seems I have to reference a task, not just mm (which I may > >>> not have to reference at all after all as the task reference should keep mm > >>> alive) as I missed the fact capable() and rlimit() are working with > >>> @current. > >> > >> Sorry, what? I'm not seeing how a task reference comes into this. > > > > I reference @mm to make sure that just one mm uses a container. If mm > > changes, I return an error, sanity check. > > > > The code also increments locked_vm in mm. But it looks at the current task > > if there is room for increments and for CAP_IPC_LOCK. > > > > So, the options are: > > 1. I do not reference the current task, and if mm changes, then the mm > > sanity check won't let me proceed to the code which tries using current OR > > 2. reference a task when I reference mm and do that sanity check not just > > for mm but also for current task. > > > > Makes sense? > > I had a chat with Nick and now I think that having mm referenced and > checked should be enough and I do not need to reference the task as > multiple threads within the same mm are allowed to ioctl() to vfio and > supposedly they will have same limits and capabilities and if they do not, > we'll just fail and that's it, I cannot see any harm from this approach, > can you? Yeah, that sounds sane to me. If the different threads in the mm somehow have different caps / limits, we could get some weird results depending on which thread attempts to do the mapping, but it shouldn't actually be harmful. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature