On 10/25/23 6:09 AM, Pavel Begunkov wrote: > On 10/23/23 16:27, Jens Axboe wrote: >> On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote: >>> Jens Axboe <axboe@xxxxxxxxx> writes: >>> >>>> We could race with SQ thread exit, and if we do, we'll hit a NULL pointer >>>> dereference. Park the SQPOLL thread while getting the task cpu and pid for >>>> fdinfo, this ensures we have a stable view of it. >>>> >>>> Cc: stable@xxxxxxxxxxxxxxx >>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 >>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>> >>>> --- >>>> >>>> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >>>> index c53678875416..cd2a0c6b97c4 100644 >>>> --- a/io_uring/fdinfo.c >>>> +++ b/io_uring/fdinfo.c >>>> @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, >>>> __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>> { >>>> struct io_ring_ctx *ctx = f->private_data; >>>> - struct io_sq_data *sq = NULL; >>>> struct io_overflow_cqe *ocqe; >>>> struct io_rings *r = ctx->rings; >>>> unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; >>>> @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>> unsigned int cq_shift = 0; >>>> unsigned int sq_shift = 0; >>>> unsigned int sq_entries, cq_entries; >>>> + int sq_pid = -1, sq_cpu = -1; >>>> bool has_lock; >>>> unsigned int i; >>>> @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>> has_lock = mutex_trylock(&ctx->uring_lock); >>>> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >>>> - sq = ctx->sq_data; >>>> - if (!sq->thread) >>>> - sq = NULL; >>>> + struct io_sq_data *sq = ctx->sq_data; >>>> + >>>> + io_sq_thread_park(sq); >>>> + if (sq->thread) { >>>> + sq_pid = task_pid_nr(sq->thread); >>>> + sq_cpu = task_cpu(sq->thread); >>>> + } >>>> + io_sq_thread_unpark(sq); >>> >>> Jens, >>> >>> io_sq_thread_park will try to wake the sqpoll, which is, at least, >>> unnecessary. But I'm thinking we don't want to expose the ability to >>> schedule the sqpoll from procfs, which can be done by any unrelated >>> process. >>> >>> To solve the bug, it should be enough to synchronize directly on >>> sqd->lock, preventing sq->thread from going away inside the if leg. >>> Granted, it is might take longer if the sqpoll is busy, but reading >>> fdinfo is not supposed to be fast. Alternatively, don't call >>> wake_process in this case? >> >> I did think about that but just went with the exported API. But you are >> right, it's a bit annoying that it'd also wake the thread, in case it > > Waking it up is not a problem but without parking sq thread won't drop > the lock until it's time to sleep, which might be pretty long leaving > the /proc read stuck on the lock uninterruptibly. > > Aside from parking vs lock, there is a lock inversion now: > > proc read | SQPOLL > | > try_lock(ring) // success | > | woken up > | lock(sqd); // success > lock(sqd); // stuck | > | try to submit requests > | -- lock(ring); // stuck Yeah good point, forgot we nest these opposite of what you'd expect. Honestly I think the fix here is just to turn it into a trylock. Yes that'll miss some cases where we could've gotten the pid/cpu, but doesn't seem worth caring about. IOW, fold in this incremental. diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index af1bdcc0703e..f04a43044d91 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -145,12 +145,13 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { struct io_sq_data *sq = ctx->sq_data; - mutex_lock(&sq->lock); - if (sq->thread) { - sq_pid = task_pid_nr(sq->thread); - sq_cpu = task_cpu(sq->thread); + if (mutex_trylock(&sq->lock)) { + if (sq->thread) { + sq_pid = task_pid_nr(sq->thread); + sq_cpu = task_cpu(sq->thread); + } + mutex_unlock(&sq->lock); } - mutex_unlock(&sq->lock); } seq_printf(m, "SqThread:\t%d\n", sq_pid); -- Jens Axboe