On 11/14/23 3:39 PM, Gabriel Krisman Bertazi wrote: > Jens Axboe <axboe@xxxxxxxxx> writes: > >> A previous commit added a trylock for getting the SQPOLL thread info via >> fdinfo, but this introduced a regression where we often fail to get it if >> the thread is busy. For that case, we end up not printing the current CPU >> and PID info. >> >> Rather than rely on this lock, just print the pid we already stored in >> the io_sq_data struct, and ensure we update the current CPU every time we >> are going to sleep. The latter won't potentially be 100% accurate, but >> that wasn't the case before either as the task can get migrated at any >> time unless it has been pinned at creation time. >> >> We retain keeping the io_sq_data dereference inside the ctx->uring_lock, >> as it has always been, as destruction of the thread and data happen below >> that. We could make this RCU safe, but there's little point in doing that. >> >> With this, we always print the last valid information we had, rather than >> have spurious outputs with missing information. >> >> Fixes: 7644b1a1c9a7 ("io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid") >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> >> --- >> >> v2: actually remember to use the cached values... also update ->sq_cpu >> when we initially set it up, if it's not pinned to a given CPU. >> >> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >> index f04a43044d91..976e9500f651 100644 >> --- a/io_uring/fdinfo.c >> +++ b/io_uring/fdinfo.c >> @@ -145,13 +145,8 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >> struct io_sq_data *sq = ctx->sq_data; >> >> - if (mutex_trylock(&sq->lock)) { >> - if (sq->thread) { >> - sq_pid = task_pid_nr(sq->thread); >> - sq_cpu = task_cpu(sq->thread); >> - } >> - mutex_unlock(&sq->lock); >> - } >> + sq_pid = sq->task_pid; >> + sq_cpu = sq->sq_cpu; >> } >> >> seq_printf(m, "SqThread:\t%d\n", sq_pid); >> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c >> index bd6c2c7959a5..ecb00322a4e5 100644 >> --- a/io_uring/sqpoll.c >> +++ b/io_uring/sqpoll.c >> @@ -229,10 +229,12 @@ static int io_sq_thread(void *data) >> snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); >> set_task_comm(current, buf); >> >> - if (sqd->sq_cpu != -1) >> + if (sqd->sq_cpu != -1) { >> set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); >> - else >> + } else { >> set_cpus_allowed_ptr(current, cpu_online_mask); >> + sqd->sq_cpu = task_cpu(current); >> + } >> >> mutex_lock(&sqd->lock); >> while (1) { >> @@ -291,6 +293,7 @@ static int io_sq_thread(void *data) >> } >> >> if (needs_sched) { >> + sqd->sq_cpu = task_cpu(current); > > Don't you also need to update sqd->sq_cpu in io_sqd_handle_event before > releasing the lock? sqpoll might get migrated after the following > schedule and then parked, in which case sqd->sq_cpu will not be up to > date for a while. > > Other than that, I think the patch is fine. We probably should, and I also forgot that we can just use raw_smp_processor_id() at this point. I'll send out a v3. -- Jens Axboe