On 11/6/24 19:34, Bernd Schubert wrote:
On 11/6/24 05:44, Ming Lei wrote:
On Wed, Nov 6, 2024 at 7:02 AM Bernd Schubert <bschubert@xxxxxxx> wrote:
On 11/5/24 02:08, Pavel Begunkov wrote:
FWIW, the original version is how it's handled in several places
across io_uring, and the difference is a gap for !DEFER_TASKRUN
when a task_work is queued somewhere in between when a task is
started going through exit() but haven't got PF_EXITING set yet.
IOW, should be harder to hit.
Does that mean that the test for PF_EXITING is racy and we cannot
entirely rely on it?
Another solution is to mark uring_cmd as io_uring_cmd_mark_cancelable(),
which provides a chance to cancel cmd in the current context.
In short, F_CANCEL not going to help, unfortunately.
The F_CANCEL path can and likely to be triggered from a kthread instead
of the original task. See call sites of io_uring_try_cancel_requests(),
where the task termination/exit path, i.e. io_uring_cancel_generic(), in
most cases will skip the call bc of the tctx_inflight() check.
Also, io_uring doesn't try to cancel queued task_work (the callback
is supposed to check if it need to fail the request), so if someone
queued up a task_work including via __io_uring_cmd_do_in_task() and
friends, even F_CANCEL won't be able to cancel it.
Yeah, I have that, see
[PATCH RFC v4 14/15] fuse: {io-uring} Prevent mount point hang on fuse-server termination
As I just wrote to Pavel, getting IO_URING_F_TASK_DEAD is rather hard
in my current branch.IO_URING_F_CANCEL didn't make a difference ,
I had especially tried to disable it - still neither
IO_URING_F_TASK_DEAD nor the crash got easily triggered. So I
reenabled IO_URING_F_CANCEL and then eventually
got IO_URING_F_TASK_DEAD - i.e. without checking the underlying code,
looks like we need both for safety measures.
--
Pavel Begunkov