On Wed, Jul 17, 2019 at 04:33:13PM -0300, Jason Gunthorpe wrote: > On Wed, Jul 17, 2019 at 10:25:25PM +0300, Shamir Rabinovitch wrote: > > On Wed, Jul 17, 2019 at 08:53:54AM -0300, Jason Gunthorpe wrote: > > > On Tue, Jul 16, 2019 at 09:11:43PM +0300, Shamir Rabinovitch wrote: > > > > From: Shamir Rabinovitch <shamir.rabinovitch@xxxxxxxxxx> > > > > > > > > ufile (&ucontext) with the process who own them must not be released > > > > when there are other ufile (&ucontext) that depens at them. > > > > > > We already have a kref, why do we need more? Especially wrongly done > > > refcounts with atomics? > > > > Yes. Will fix in v2. > > > > > > > > Trying to sequence the destroy of the ucontext seems inherently wrong > > > to me. If the driver has to link the PD/MR to data in the ucontext it > > > can't support sharing. > > > > The issue we try to solve here is this: > > > > [process 1] [process 2] > > - alloc mr & point mr to - > > context 1 > > - share context - > > - - import mr > > - exit - > > - - exit > > - -- ufile_destroy_ucontext > > - --- driver dereg_mr is called > > - ---- ib_umem_release on umem from > > previously destroyed context 1 > > Like I said, drivers that require the creating ucontext as part of the > PD and MR cannot support sharing. Even if we can make sure the process that creates the MR stays alive until all reference to this MR completes? > > > > > + int wait; > > > > + > > > > + if (ufile->parent) { > > > > + pr_debug("%s: release parent ufile. ufile %p parent %p\n", > > > > + __func__, ufile, ufile->parent); > > > > + if (atomic_dec_and_test(&ufile->parent->refcount)) > > > > + complete(&ufile->parent->context_released); > > > > + } > > > > + > > > > + if (!atomic_dec_and_test(&ufile->refcount)) { > > > > +wait: > > > > + wait = wait_for_completion_interruptible_timeout( > > > > + &ufile->context_released, 3*HZ); > > > > + if (wait == -ERESTARTSYS) { > > > > + WARN_ONCE(1, > > > > + "signal while waiting for context release! ufile %p\n", > > > > + ufile); > > > > > > ???? > > > > > > Jason > > > > I copied the behaviour I saw in the rest of the kernel as for what to do > > when wait_for_completion_interruptible_timeout exit due to interrupt. > > It doesn't really make sense here, we can't block release() like this > > Jason