On 04/02/2021 03:25, Hao Xu wrote: > 在 2021/2/4 上午12:45, Pavel Begunkov 写道: >> On 03/02/2021 16:35, Pavel Begunkov wrote: >>> On 03/02/2021 14:57, Hao Xu wrote: >>>> This is caused by calling io_run_task_work_sig() to do work under >>>> uring_lock while the caller io_sqe_files_unregister() already held >>>> uring_lock. >>>> we need to check if uring_lock is held by us when doing unlock around >>>> io_run_task_work_sig() since there are code paths down to that place >>>> without uring_lock held. >>> >>> 1. we don't want to allow parallel io_sqe_files_unregister()s >>> happening, it's synchronised by uring_lock atm. Otherwise it's >>> buggy. > Here "since there are code paths down to that place without uring_lock held" I mean code path of io_ring_ctx_free(). I guess it's to the 1/2, but let me outline the problem again: if you have two tasks userspace threads sharing a ring, then they can both and in parallel call syscall:files_unregeister. That's a potential double percpu_ref_kill(&data->refs), or even worse. Same for 2, but racing for the table and refs. >> >> This one should be simple, alike to >> >> if (percpu_refs_is_dying()) >> return error; // fail *files_unregister(); >> >>> >>> 2. probably same with unregister and submit. >>> >>>> >>>> Reported-by: Abaci <abaci@xxxxxxxxxxxxxxxxx> >>>> Fixes: 1ffc54220c44 ("io_uring: fix io_sqe_files_unregister() hangs") >>>> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> >>>> --- >>>> fs/io_uring.c | 19 +++++++++++++------ >>>> 1 file changed, 13 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index efb6d02fea6f..b093977713ee 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -7362,18 +7362,25 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx, bool locked) >>>> /* wait for all refs nodes to complete */ >>>> flush_delayed_work(&ctx->file_put_work); >>>> + if (locked) >>>> + mutex_unlock(&ctx->uring_lock); >>>> do { >>>> ret = wait_for_completion_interruptible(&data->done); >>>> if (!ret) >>>> break; >>>> ret = io_run_task_work_sig(); >>>> - if (ret < 0) { >>>> - percpu_ref_resurrect(&data->refs); >>>> - reinit_completion(&data->done); >>>> - io_sqe_files_set_node(data, backup_node); >>>> - return ret; >>>> - } >>>> + if (ret < 0) >>>> + break; >>>> } while (1); >>>> + if (locked) >>>> + mutex_lock(&ctx->uring_lock); >>>> + >>>> + if (ret < 0) { >>>> + percpu_ref_resurrect(&data->refs); >>>> + reinit_completion(&data->done); >>>> + io_sqe_files_set_node(data, backup_node); >>>> + return ret; >>>> + } >>>> __io_sqe_files_unregister(ctx); >>>> nr_tables = DIV_ROUND_UP(ctx->nr_user_files, IORING_MAX_FILES_TABLE); >>>> >>> >> > -- Pavel Begunkov