On 03/02/2021 14:57, Hao Xu wrote: > This is caused by calling io_run_task_work_sig() to do work under > uring_lock while the caller io_sqe_files_unregister() already held > uring_lock. > we need to check if uring_lock is held by us when doing unlock around > io_run_task_work_sig() since there are code paths down to that place > without uring_lock held. 1. we don't want to allow parallel io_sqe_files_unregister()s happening, it's synchronised by uring_lock atm. Otherwise it's buggy. 2. probably same with unregister and submit. > > Reported-by: Abaci <abaci@xxxxxxxxxxxxxxxxx> > Fixes: 1ffc54220c44 ("io_uring: fix io_sqe_files_unregister() hangs") > Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> > --- > fs/io_uring.c | 19 +++++++++++++------ > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index efb6d02fea6f..b093977713ee 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -7362,18 +7362,25 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx, bool locked) > > /* wait for all refs nodes to complete */ > flush_delayed_work(&ctx->file_put_work); > + if (locked) > + mutex_unlock(&ctx->uring_lock); > do { > ret = wait_for_completion_interruptible(&data->done); > if (!ret) > break; > ret = io_run_task_work_sig(); > - if (ret < 0) { > - percpu_ref_resurrect(&data->refs); > - reinit_completion(&data->done); > - io_sqe_files_set_node(data, backup_node); > - return ret; > - } > + if (ret < 0) > + break; > } while (1); > + if (locked) > + mutex_lock(&ctx->uring_lock); > + > + if (ret < 0) { > + percpu_ref_resurrect(&data->refs); > + reinit_completion(&data->done); > + io_sqe_files_set_node(data, backup_node); > + return ret; > + } > > __io_sqe_files_unregister(ctx); > nr_tables = DIV_ROUND_UP(ctx->nr_user_files, IORING_MAX_FILES_TABLE); > -- Pavel Begunkov