On 10/25/23 14:44, Jens Axboe wrote:
On 10/25/23 6:09 AM, Pavel Begunkov wrote:
On 10/23/23 16:27, Jens Axboe wrote:
On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote:
Jens Axboe <axboe@xxxxxxxxx> writes:
We could race with SQ thread exit, and if we do, we'll hit a NULL pointer
dereference. Park the SQPOLL thread while getting the task cpu and pid for
fdinfo, this ensures we have a stable view of it.
Cc: stable@xxxxxxxxxxxxxxx
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index c53678875416..cd2a0c6b97c4 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
__cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
{
struct io_ring_ctx *ctx = f->private_data;
- struct io_sq_data *sq = NULL;
struct io_overflow_cqe *ocqe;
struct io_rings *r = ctx->rings;
unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
@@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
unsigned int cq_shift = 0;
unsigned int sq_shift = 0;
unsigned int sq_entries, cq_entries;
+ int sq_pid = -1, sq_cpu = -1;
bool has_lock;
unsigned int i;
@@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
has_lock = mutex_trylock(&ctx->uring_lock);
if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
- sq = ctx->sq_data;
- if (!sq->thread)
- sq = NULL;
+ struct io_sq_data *sq = ctx->sq_data;
+
+ io_sq_thread_park(sq);
+ if (sq->thread) {
+ sq_pid = task_pid_nr(sq->thread);
+ sq_cpu = task_cpu(sq->thread);
+ }
+ io_sq_thread_unpark(sq);
Jens,
io_sq_thread_park will try to wake the sqpoll, which is, at least,
unnecessary. But I'm thinking we don't want to expose the ability to
schedule the sqpoll from procfs, which can be done by any unrelated
process.
To solve the bug, it should be enough to synchronize directly on
sqd->lock, preventing sq->thread from going away inside the if leg.
Granted, it is might take longer if the sqpoll is busy, but reading
fdinfo is not supposed to be fast. Alternatively, don't call
wake_process in this case?
I did think about that but just went with the exported API. But you are
right, it's a bit annoying that it'd also wake the thread, in case it
Waking it up is not a problem but without parking sq thread won't drop
the lock until it's time to sleep, which might be pretty long leaving
the /proc read stuck on the lock uninterruptibly.
Aside from parking vs lock, there is a lock inversion now:
proc read | SQPOLL
|
try_lock(ring) // success |
| woken up
| lock(sqd); // success
lock(sqd); // stuck |
| try to submit requests
| -- lock(ring); // stuck
Yeah good point, forgot we nest these opposite of what you'd expect.
Honestly I think the fix here is just to turn it into a trylock. Yes
that'll miss some cases where we could've gotten the pid/cpu, but
doesn't seem worth caring about.
IOW, fold in this incremental.
Should work, otherwise you probably can just park first.
Long term it'd be nice to make sqpoll to not hold sqd->lock during
submission + polling as it currently does. Park callers sleep on the
lock, but if we replace it with some struct completion the rest
should be easy enough.
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index af1bdcc0703e..f04a43044d91 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -145,12 +145,13 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
struct io_sq_data *sq = ctx->sq_data;
- mutex_lock(&sq->lock);
- if (sq->thread) {
- sq_pid = task_pid_nr(sq->thread);
- sq_cpu = task_cpu(sq->thread);
+ if (mutex_trylock(&sq->lock)) {
+ if (sq->thread) {
+ sq_pid = task_pid_nr(sq->thread);
+ sq_cpu = task_cpu(sq->thread);
+ }
+ mutex_unlock(&sq->lock);
}
- mutex_unlock(&sq->lock);
}
seq_printf(m, "SqThread:\t%d\n", sq_pid);
--
Pavel Begunkov