On Fri, Sep 11, 2020 at 11:20:17PM +0800, Hillf Danton wrote: > > On Fri, 11 Sep 2020 08:57:50 -0300 Jason Gunthorpe wrote: > > On Fri, Sep 11, 2020 at 12:16:40PM +0800, Hillf Danton wrote: > > > Detect race destroying ctx in order to avoid UAF. > > > > > > +++ b/drivers/infiniband/core/ucma.c > > > @@ -625,6 +625,10 @@ static ssize_t ucma_destroy_id(struct uc > > > return PTR_ERR(ctx); > > > > > > mutex_lock(&ctx->file->mut); > > > + if (ctx->destroying == 1) { > > > + mutex_unlock(&ctx->file->mut); > > > + return -ENXIO; > > > + } > > > ctx->destroying = 1; > > > mutex_unlock(&ctx->file->mut); > > > > > > @@ -1826,6 +1830,8 @@ static int ucma_close(struct inode *inod > > > > > > mutex_lock(&file->mut); > > > list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { > > > + if (ctx->destroying == 1) > > > + continue; > > > ctx->destroying = 1; > > > mutex_unlock(&file->mut); > > > > > > > ucma_destroy_id() is called from write() and ucma_close is release(), > > so there is no way these can race? > > Sound good but what's reported is uaf in the close path, which is > impossible without another thread releasing the ctx a step ahead > the closer. > Can we call it a race if that's true? Migrate is the cause, very tricky: CPU0 CPU1 ucma_destroy_id() ucma_migrate_id() ucma_get_ctx() xa_lock() _ucma_find_context() xa_erase() xa_lock() ctx->file = new_file list_move() xa_unlock() ucma_put_ctx ucma_close() _destroy_id() _destroy_id() wait_for_completion() // boom ie the destrory_id() on the initial FD captures the ctx right before migrate moves it, then the new FD closes calling destroy while the other destroy is still running. Sigh, I will rewrite migrate too.. Jason