On Wed, Jun 13, 2018 at 12:36 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Tue, May 08, 2018 at 04:50:16PM +0800, Lidong Chen wrote: >> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads. >> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has >> exited, get_pid_task will return NULL, ib_umem_release does not decrease >> mm->pinned_vm. This patch fixes it by use tgid in ib_ucontext struct. >> >> Signed-off-by: Lidong Chen <lidongchen@xxxxxxxxxxx> >> --- >> [v2] >> - use ib_ucontext tgid instread of tgid in ib_umem structure > > I'm looking at this again, and it doesn't seem quite right.. > >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c >> index 9a4e899..2b6c9b5 100644 >> --- a/drivers/infiniband/core/umem.c >> +++ b/drivers/infiniband/core/umem.c >> @@ -119,7 +119,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, >> umem->length = size; >> umem->address = addr; >> umem->page_shift = PAGE_SHIFT; >> - umem->pid = get_task_pid(current, PIDTYPE_PID); >> /* >> * We ask for writable memory if any of the following >> * access flags are set. "Local write" and "remote write" >> @@ -132,7 +131,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, >> IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); >> >> if (access & IB_ACCESS_ON_DEMAND) { >> - put_pid(umem->pid); >> ret = ib_umem_odp_get(context, umem, access); >> if (ret) { >> kfree(umem); >> @@ -148,7 +146,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, >> >> page_list = (struct page **) __get_free_page(GFP_KERNEL); >> if (!page_list) { >> - put_pid(umem->pid); >> kfree(umem); >> return ERR_PTR(-ENOMEM); >> } > > in ib_umem_get we are doing this: > > down_write(¤t->mm->mmap_sem); > locked = npages + current->mm->pinned_vm; > > And then in release we now do: > > task = get_pid_task(umem->context->tgid, PIDTYPE_PID); > if (!task) > goto out; > mm = get_task_mm(task); > mm->pinned_vm -= diff; > > But there is no guarantee that context->tgid and 'current' are the > same thing during ib_umem_get.. context->tgid and current maybe different. but different threads in one process should point to one mm structure. so it should works for multithread. > > So in the dysfunctional case where someone forks and keeps the context > FD open on both sides of the fork they can cause the pinned_vm > counter to become wrong in the processes. Sounds bad.. I am not sure about fork support, I will check this problem. > > Thus, I think we need to go back to storing the tgid in the ib_umem > and just fix it to store the group leader not the thread PID? > > And then even more we need the ib_get_mr_mm() helper to make sense of > this, because all the drivers are doing the wrong thing by using the > context->tgid too. > > Is that all right? > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html