Hi, On July 3, 2020 5:48:21 PM PDT, Jens Axboe <axboe@xxxxxxxxx> wrote: >On 7/3/20 6:15 PM, Andres Freund wrote: >> Hi, >> >> On 2020-07-03 17:00:49 -0700, Andres Freund wrote: >>> I haven't yet fully analyzed the problem, but after updating to >>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres >does >>> not work reliably anymore. >>> >>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't >>> interrupted by signals anymore. The signal handler executes, but >>> afterwards the syscall is restarted. Previously io_uring_enter >reliably >>> returned EINTR in that case. >>> >>> Currently postgres relies on signals interrupting io_uring_enter(). >We >>> probably can find a way to not do so, but it'd not be entirely >trivial. >>> >>> I suspect the issue is >>> >>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag: >io_uring-5.8-2020-07-01, linux-block/io_uring-5.8) >>> Author: Jens Axboe <axboe@xxxxxxxxx> >>> Date: 2020-06-30 12:39:05 -0600 >>> >>> io_uring: use signal based task_work running >>> >>> as that appears to have changed the error returned by >>> io_uring_enter(GETEVENTS) after having been interrupted by a signal >from >>> EINTR to ERESTARTSYS. >>> >>> >>> I'll check to make sure that the issue doesn't exist before the >above >>> commit. >> >> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue >doesn't >> exist, which pretty much confirms that the above commit is the issue. >> >> What was the reason for changing EINTR to ERESTARTSYS in the above >> commit? I assume trying to avoid returning spurious EINTRs to >userland? > >Yeah, for when it's running task_work. I wonder if something like the >below will do the trick? > > >diff --git a/fs/io_uring.c b/fs/io_uring.c >index 700644a016a7..0efa73d78451 100644 >--- a/fs/io_uring.c >+++ b/fs/io_uring.c >@@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx >*ctx, int min_events, > do { > prepare_to_wait_exclusive(&ctx->wait, &iowq.wq, > TASK_INTERRUPTIBLE); >- /* make sure we run task_work before checking for signals */ >- if (current->task_works) >- task_work_run(); > if (signal_pending(current)) { >- ret = -ERESTARTSYS; >+ if (current->task_works) >+ ret = -ERESTARTSYS; >+ else >+ ret = -EINTR; > break; > } > if (io_should_wake(&iowq, false)) >@@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx >*ctx, int min_events, > } while (1); > finish_wait(&ctx->wait, &iowq.wq); > >- restore_saved_sigmask_unless(ret == -ERESTARTSYS); >+ restore_saved_sigmask_unless(ret == -EINTR); > > return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : >0; > } I'll try in a bit. Suspect however that there'd be trouble if there were both an actual signal and task work pending? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.