On 10/25/23 8:09 AM, Pavel Begunkov wrote: > On 10/25/23 14:44, Jens Axboe wrote: >> On 10/25/23 6:09 AM, Pavel Begunkov wrote: >>> On 10/23/23 16:27, Jens Axboe wrote: >>>> On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote: >>>>> Jens Axboe <axboe@xxxxxxxxx> writes: >>>>> >>>>>> We could race with SQ thread exit, and if we do, we'll hit a NULL pointer >>>>>> dereference. Park the SQPOLL thread while getting the task cpu and pid for >>>>>> fdinfo, this ensures we have a stable view of it. >>>>>> >>>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 >>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>>>> >>>>>> --- >>>>>> >>>>>> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >>>>>> index c53678875416..cd2a0c6b97c4 100644 >>>>>> --- a/io_uring/fdinfo.c >>>>>> +++ b/io_uring/fdinfo.c >>>>>> @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, >>>>>> __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>>> { >>>>>> struct io_ring_ctx *ctx = f->private_data; >>>>>> - struct io_sq_data *sq = NULL; >>>>>> struct io_overflow_cqe *ocqe; >>>>>> struct io_rings *r = ctx->rings; >>>>>> unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; >>>>>> @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>>> unsigned int cq_shift = 0; >>>>>> unsigned int sq_shift = 0; >>>>>> unsigned int sq_entries, cq_entries; >>>>>> + int sq_pid = -1, sq_cpu = -1; >>>>>> bool has_lock; >>>>>> unsigned int i; >>>>>> @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>>> has_lock = mutex_trylock(&ctx->uring_lock); >>>>>> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >>>>>> - sq = ctx->sq_data; >>>>>> - if (!sq->thread) >>>>>> - sq = NULL; >>>>>> + struct io_sq_data *sq = ctx->sq_data; >>>>>> + >>>>>> + io_sq_thread_park(sq); >>>>>> + if (sq->thread) { >>>>>> + sq_pid = task_pid_nr(sq->thread); >>>>>> + sq_cpu = task_cpu(sq->thread); >>>>>> + } >>>>>> + io_sq_thread_unpark(sq); >>>>> >>>>> Jens, >>>>> >>>>> io_sq_thread_park will try to wake the sqpoll, which is, at least, >>>>> unnecessary. But I'm thinking we don't want to expose the ability to >>>>> schedule the sqpoll from procfs, which can be done by any unrelated >>>>> process. >>>>> >>>>> To solve the bug, it should be enough to synchronize directly on >>>>> sqd->lock, preventing sq->thread from going away inside the if leg. >>>>> Granted, it is might take longer if the sqpoll is busy, but reading >>>>> fdinfo is not supposed to be fast. Alternatively, don't call >>>>> wake_process in this case? >>>> >>>> I did think about that but just went with the exported API. But you are >>>> right, it's a bit annoying that it'd also wake the thread, in case it >>> >>> Waking it up is not a problem but without parking sq thread won't drop >>> the lock until it's time to sleep, which might be pretty long leaving >>> the /proc read stuck on the lock uninterruptibly. >>> >>> Aside from parking vs lock, there is a lock inversion now: >>> >>> proc read | SQPOLL >>> | >>> try_lock(ring) // success | >>> | woken up >>> | lock(sqd); // success >>> lock(sqd); // stuck | >>> | try to submit requests >>> | -- lock(ring); // stuck >> >> Yeah good point, forgot we nest these opposite of what you'd expect. >> Honestly I think the fix here is just to turn it into a trylock. Yes >> that'll miss some cases where we could've gotten the pid/cpu, but >> doesn't seem worth caring about. >> >> IOW, fold in this incremental. > > Should work, otherwise you probably can just park first. In general I think it's better to have the observational side be less of an impact, which is why I liked not doing the parking. Sometimes apps do stupid things and monitor fdinfo/ etc continually. As it's possible to get the task/cpu anyway via other means, should be better to just skip if we don't get the lock. > Long term it'd be nice to make sqpoll to not hold sqd->lock during > submission + polling as it currently does. Park callers sleep on the > lock, but if we replace it with some struct completion the rest > should be easy enough. Yeah agree. -- Jens Axboe