On 9/19/24 12:31 PM, Jan Hendrik Farr wrote: > On 19 12:06:20, Jens Axboe wrote: >> On 9/19/24 10:47 AM, Jan Hendrik Farr wrote: >>>> [...] >>>> >>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>> index 75f0087183e5..56097627eafc 100644 >>>> --- a/io_uring/io_uring.c >>>> +++ b/io_uring/io_uring.c >>>> @@ -2472,7 +2472,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, >>>> return 1; >>>> if (unlikely(!llist_empty(&ctx->work_llist))) >>>> return 1; >>>> - if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL))) >>>> + if (unlikely(task_work_pending(current))) >>>> return 1; >>>> if (unlikely(task_sigpending(current))) >>>> return -EINTR; >>>> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h >>>> index 9d70b2cf7b1e..2fbf0ea9c171 100644 >>>> --- a/io_uring/io_uring.h >>>> +++ b/io_uring/io_uring.h >>>> @@ -308,15 +308,17 @@ static inline int io_run_task_work(void) >>>> */ >>>> if (test_thread_flag(TIF_NOTIFY_SIGNAL)) >>>> clear_notify_signal(); >>>> + >>>> + if (test_thread_flag(TIF_NOTIFY_RESUME)) { >>>> + __set_current_state(TASK_RUNNING); >>>> + resume_user_mode_work(NULL); >>>> + } >>>> + >>>> /* >>>> * PF_IO_WORKER never returns to userspace, so check here if we have >>>> * notify work that needs processing. >>>> */ >>>> if (current->flags & PF_IO_WORKER) { >>>> - if (test_thread_flag(TIF_NOTIFY_RESUME)) { >>>> - __set_current_state(TASK_RUNNING); >>>> - resume_user_mode_work(NULL); >>>> - } >>>> if (current->io_uring) { >>>> unsigned int count = 0; >>>> >>>> >>> >>> Can confirm that also this patch fixes the issue on my end (both with the >>> reordering of the task_work and without it). >> >> Great, thanks for testing! Sent out a v2. No need to test it unless you >> absolutely want to ;-) >> >>> Also found a different way to trigger the issue that does not misuse >>> IOSQE_IO_LINK. Do three sends with IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_LINK >>> followed by a close with IOSQE_CQE_SKIP_SUCCESS on a ring with >>> IORING_SETUP_DEFER_TASKRUN. >>> >>> I confirmed that that test case also first brakes on >>> 846072f16eed3b3fb4e59b677f3ed8afb8509b89 and is fixed by either of the >>> two patches you sent. >>> >>> Not sure if that's a preferable test case compared to the weirder ealier one. >>> You can find it below as a patch to the existing test case in the liburing >>> repo: >> >> I think that's an improvement, just because it doesn't rely on a weird >> usage of IOSQE_IO_LINK. And it looks good to me - do you want me to >> commit this directly, or do you want to send a "proper" patch (or github >> PR) to retain the proper attribution to you? >> > > Sent the PR with one minor change (adjusted the user data for the third > send). And pulled, thanks. -- Jens Axboe