On Wed, Jul 17, 2019 at 08:53:54AM -0300, Jason Gunthorpe wrote: > On Tue, Jul 16, 2019 at 09:11:43PM +0300, Shamir Rabinovitch wrote: > > From: Shamir Rabinovitch <shamir.rabinovitch@xxxxxxxxxx> > > > > ufile (&ucontext) with the process who own them must not be released > > when there are other ufile (&ucontext) that depens at them. > > We already have a kref, why do we need more? Especially wrongly done > refcounts with atomics? Yes. Will fix in v2. > > Trying to sequence the destroy of the ucontext seems inherently wrong > to me. If the driver has to link the PD/MR to data in the ucontext it > can't support sharing. The issue we try to solve here is this: [process 1] [process 2] - alloc mr & point mr to - context 1 - share context - - - import mr - exit - -- ufile_destroy_ucontext - --- ib_mr is not destroyed - --- context 1 is destroyed - - - exit - -- ufile_destroy_ucontext - --- driver dereg_mr is called - ---- ib_umem_release on umem from previously destroyed context 1 If I recall correctly, you suggested the shere and shree concept. We also talked with Mellanox architecture team and they suggested that the shrere will be bullet proof process that *only* create and share objects. The whole thing directly links to the next step we talked about which is sharing objects via file system rathen then via FD. > > > Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@xxxxxxxxxx> > > Signed-off-by: Shamir Rabinovitch <srabinov7@xxxxxxxxx> > > drivers/infiniband/core/rdma_core.c | 29 +++++++++++++++++++++++++++ > > drivers/infiniband/core/uverbs.h | 22 ++++++++++++++++++++ > > drivers/infiniband/core/uverbs_cmd.c | 16 +++++++++++++++ > > drivers/infiniband/core/uverbs_main.c | 4 ++++ > > 4 files changed, 71 insertions(+) > > > > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c > > index 651625f632d7..c81ff8e28fc6 100644 > > +++ b/drivers/infiniband/core/rdma_core.c > > @@ -841,6 +841,33 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile, > > ufile->ucontext = NULL; > > } > > > > +static void __uverbs_ufile_refcount(struct ib_uverbs_file *ufile) > > +{ > > + int wait; > > + > > + if (ufile->parent) { > > + pr_debug("%s: release parent ufile. ufile %p parent %p\n", > > + __func__, ufile, ufile->parent); > > + if (atomic_dec_and_test(&ufile->parent->refcount)) > > + complete(&ufile->parent->context_released); > > + } > > + > > + if (!atomic_dec_and_test(&ufile->refcount)) { > > +wait: > > + wait = wait_for_completion_interruptible_timeout( > > + &ufile->context_released, 3*HZ); > > + if (wait == -ERESTARTSYS) { > > + WARN_ONCE(1, > > + "signal while waiting for context release! ufile %p\n", > > + ufile); > > ???? > > Jason I copied the behaviour I saw in the rest of the kernel as for what to do when wait_for_completion_interruptible_timeout exit due to interrupt. >From the above reason I think we need to delay the shrere process exit so it will not close the context prematurely *unless* it receive signal. In that case I'd expect that process to soot some clear warning and do whatevere id need to do for the given signal.