On 22/11/16 13:38, David Gibson wrote: > On Thu, Nov 17, 2016 at 06:39:41PM +1100, Alexey Kardashevskiy wrote: >> On 11/11/16 23:32, Alexey Kardashevskiy wrote: >>> In some situations the userspace memory context may live longer than >>> the userspace process itself so if we need to do proper memory context >>> cleanup, we better have tce_container take a reference to mm_struct and >>> use it later when the process is gone (@current or @current->mm is NULL). >>> >>> This references mm and stores the pointer in the container; this is done >>> in a new helper - tce_iommu_mm_set() - when one of the following happens: >>> - a container is enabled (IOMMU v1); >>> - a first attempt to pre-register memory is made (IOMMU v2); >>> - a DMA window is created (IOMMU v2). >>> The @mm stays referenced till the container is destroyed. >>> >>> This replaces current->mm with container->mm everywhere except debug >>> prints. >>> >>> This adds a check that current->mm is the same as the one stored in >>> the container to prevent userspace from making changes to a memory >>> context of other processes. >>> >>> DMA map/unmap ioctls() do not check for @mm as they already check >>> for @enabled which is set after tce_iommu_mm_set() is called. >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>> --- >>> Changes: >>> v5: >>> * postpone referencing of mm >>> >>> v4: >>> * added check for container->mm!=current->mm in tce_iommu_ioctl() >>> for all ioctls and removed other redundand checks >>> --- >>> drivers/vfio/vfio_iommu_spapr_tce.c | 159 ++++++++++++++++++++++-------------- >>> 1 file changed, 99 insertions(+), 60 deletions(-) >>> >>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c >>> index 1c02498..9a81a7e 100644 >>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c >>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c >>> @@ -31,49 +31,49 @@ >>> static void tce_iommu_detach_group(void *iommu_data, >>> struct iommu_group *iommu_group); >>> >>> -static long try_increment_locked_vm(long npages) >>> +static long try_increment_locked_vm(struct mm_struct *mm, long npages) >>> { >>> long ret = 0, locked, lock_limit; >>> >>> - if (!current || !current->mm) >>> - return -ESRCH; /* process exited */ >>> + if (!mm) >>> + return -EPERM; >>> >>> if (!npages) >>> return 0; >>> >>> - down_write(¤t->mm->mmap_sem); >>> - locked = current->mm->locked_vm + npages; >>> + down_write(&mm->mmap_sem); >>> + locked = mm->locked_vm + npages; >>> lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; >>> if (locked > lock_limit && !capable(CAP_IPC_LOCK)) >> >> >> >> Oh boy. Now it seems I have to reference a task, not just mm (which I may >> not have to reference at all after all as the task reference should keep mm >> alive) as I missed the fact capable() and rlimit() are working with >> @current. > > Sorry, what? I'm not seeing how a task reference comes into this. I reference @mm to make sure that just one mm uses a container. If mm changes, I return an error, sanity check. The code also increments locked_vm in mm. But it looks at the current task if there is room for increments and for CAP_IPC_LOCK. So, the options are: 1. I do not reference the current task, and if mm changes, then the mm sanity check won't let me proceed to the code which tries using current OR 2. reference a task when I reference mm and do that sanity check not just for mm but also for current task. Makes sense? > >> >> >> Alex, >> >> Is there anything else I should fix before posting v6? Thanks. >> >> >> > -- Alexey
Attachment:
signature.asc
Description: OpenPGP digital signature