Re: [PATCH v2] io-wq: backoff when retrying worker creation

Uday Shankar <ushankar@xxxxxxxxxxxxxxx> · Tue, 18 Feb 2025 12:39:11 -0700

On Fri, Feb 14, 2025 at 03:31:54PM -0700, Jens Axboe wrote:
> I'll get it queued up. I do think for a better fix, we could rely on
> task_work on the actual task in question. Because that will be run once
> it exits to userspace, which will deliver any pending signals as well.
> That should be a better gating mechanism for the retry. But that will
> most likely become more involved, so I think doing something like this
> first is fine.

How would that work? task_work_run is called from various places,
including get_signal where we're fairly likely to have a signal pending.
I don't think there is a way to get a task_work item to run only when
we're guaranteed that no signal is pending. There is the "resume user
mode work" stuff but that looks like it is only about the notification
mechanism - the work item itself is not marked in any way and may be
executed "sooner" e.g. if the task gets a signal.

This also doesn't work for retries past the first - in that case, when
we fail create_io_thread, we're already in task_work context, and
immediately queueing a task_work for the retry there won't work, as the
very same invocation of task_work_run that we're currently in will pick
up the new work as well. I assume that was the whole reason why we
bounced queueing the retry to a kworker, only to come back to the
original task via task_work in the first place.

I also thought it might be worth studying what fork() and friends do,
since they have to deal with a similar problem. These syscalls seem to
do their retry by editing the syscalling task's registers before
returning to userspace in such a way that the syscall instruction is
executed again. If there's a signal that needs to be delivered, the
signal handler in userspace is called before the retry executes. This
solution seems very specific to a syscall and I don't think we can take
inspiration from it given that we are calling copy_process from
task_work...