On Mon, Sep 14, 2020 at 08:59:56AM -0300, Jason Gunthorpe wrote: > ucma_destroy_id() assumes that all things accessing the ctx will do so via > the xarray. This assumption violated only in the case the FD is being > closed, then the ctx is reached via the ctx_list. Normally this is OK > since ucma_destroy_id() cannot run concurrenty with release(), however > with ucma_migrate_id() is involved this can violated as the close of the > 2nd FD can run concurrently with destroy on the first: > > CPU0 CPU1 > ucma_destroy_id(fda) > ucma_migrate_id(fda -> fdb) > ucma_get_ctx() > xa_lock() > _ucma_find_context() > xa_erase() > xa_unlock() > xa_lock() > ctx->file = new_file > list_move() > xa_unlock() > ucma_put_ctx() > > ucma_close(fdb) > _destroy_id() > kfree(ctx) > > _destroy_id() > wait_for_completion() > // boom, ctx was freed > > The ctx->file must be modified under the handler and xa_lock, and prior to > modification the ID must be rechecked that it is still reachable from > cur_file, ie there is no parallel destroy or migrate. > > To make this work remove the double locking and streamline the control > flow. The double locking was obsoleted by the handler lock now directly > preventing new uevents from being created, and the ctx_list cannot be read > while holding fgets on both files. Removing the double locking also > removes the need to check for the same file. > > Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()") > Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > --- > drivers/infiniband/core/ucma.c | 79 +++++++++++++--------------------- > 1 file changed, 29 insertions(+), 50 deletions(-) Applied to for-next Jason