On 7/3/20 6:15 PM, Andres Freund wrote: > Hi, > > On 2020-07-03 17:00:49 -0700, Andres Freund wrote: >> I haven't yet fully analyzed the problem, but after updating to >> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres does >> not work reliably anymore. >> >> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't >> interrupted by signals anymore. The signal handler executes, but >> afterwards the syscall is restarted. Previously io_uring_enter reliably >> returned EINTR in that case. >> >> Currently postgres relies on signals interrupting io_uring_enter(). We >> probably can find a way to not do so, but it'd not be entirely trivial. >> >> I suspect the issue is >> >> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag: io_uring-5.8-2020-07-01, linux-block/io_uring-5.8) >> Author: Jens Axboe <axboe@xxxxxxxxx> >> Date: 2020-06-30 12:39:05 -0600 >> >> io_uring: use signal based task_work running >> >> as that appears to have changed the error returned by >> io_uring_enter(GETEVENTS) after having been interrupted by a signal from >> EINTR to ERESTARTSYS. >> >> >> I'll check to make sure that the issue doesn't exist before the above >> commit. > > Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue doesn't > exist, which pretty much confirms that the above commit is the issue. > > What was the reason for changing EINTR to ERESTARTSYS in the above > commit? I assume trying to avoid returning spurious EINTRs to userland? Yeah, for when it's running task_work. I wonder if something like the below will do the trick? diff --git a/fs/io_uring.c b/fs/io_uring.c index 700644a016a7..0efa73d78451 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, do { prepare_to_wait_exclusive(&ctx->wait, &iowq.wq, TASK_INTERRUPTIBLE); - /* make sure we run task_work before checking for signals */ - if (current->task_works) - task_work_run(); if (signal_pending(current)) { - ret = -ERESTARTSYS; + if (current->task_works) + ret = -ERESTARTSYS; + else + ret = -EINTR; break; } if (io_should_wake(&iowq, false)) @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, } while (1); finish_wait(&ctx->wait, &iowq.wq); - restore_saved_sigmask_unless(ret == -ERESTARTSYS); + restore_saved_sigmask_unless(ret == -EINTR); return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; } -- Jens Axboe