Re: [PATCH 3/5] io_uring: implement our own schedule timeout handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/20/24 22:39, Jens Axboe wrote:
On 8/20/24 3:37 PM, David Wei wrote:
On 2024-08-20 14:34, Jens Axboe wrote:
On 8/20/24 2:08 PM, David Wei wrote:
On 2024-08-19 16:28, Jens Axboe wrote:
In preparation for having two distinct timeouts and avoid waking the
task if we don't need to.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
  io_uring/io_uring.c | 41 ++++++++++++++++++++++++++++++++++++-----
  io_uring/io_uring.h |  2 ++
  2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9e2b8d4c05db..ddfbe04c61ed 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2322,7 +2322,7 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
  	 * Cannot safely flush overflowed CQEs from here, ensure we wake up
  	 * the task, and the next invocation will do it.
  	 */
-	if (io_should_wake(iowq) || io_has_work(iowq->ctx))
+	if (io_should_wake(iowq) || io_has_work(iowq->ctx) || iowq->hit_timeout)

iowq->hit_timeout may be modified in a timer softirq context, while this
wait_queue_func_t (AIUI) may get called from any context e.g.
net_rx_softirq for sockets. Does this need a READ_ONLY()?

Yes probably not a bad idea to make it READ_ONCE().

  		return autoremove_wake_function(curr, mode, wake_flags, key);
  	return -1;
  }
@@ -2350,6 +2350,38 @@ static bool current_pending_io(void)
  	return percpu_counter_read_positive(&tctx->inflight);
  }
+static enum hrtimer_restart io_cqring_timer_wakeup(struct hrtimer *timer)
+{
+	struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t);
+	struct io_ring_ctx *ctx = iowq->ctx;
+
+	WRITE_ONCE(iowq->hit_timeout, 1);
+	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
+		wake_up_process(ctx->submitter_task);
+	else
+		io_cqring_wake(ctx);

This is a bit different to schedule_hrtimeout_range_clock(). Why is
io_cqring_wake() needed here for non-DEFER_TASKRUN?

That's how the wakeups work - for defer taskrun, the task isn't on a
waitqueue at all. Hence we need to wake the task itself. For any other
setup, they will be on the waitqueue, and we just call io_cqring_wake()
to wake up anyone waiting on the waitqueue. That will iterate the wake
queue and call handlers for each item. Having a separate handler for
that will allow to NOT wake up the task if we don't need to.
taskrun, the waker

To rephase the question, why is the original code calling
schedule_hrtimeout_range_clock() not needing to differentiate behaviour
between defer taskrun and not?

Because that part is the same, the task schedules out and goes to sleep.
That has always been the same regardless of how the ring is setup. Only
difference is that DEFER_TASKRUN doesn't add itself to ctx->wait, and
hence cannot be woken by a wake_up(ctx->wait). We have to wake the task
manually.

Answering the question, for !IORING_SETUP_DEFER_TASKRUN, before it
was waking only the task that started the timeout. Now,
io_cqring_wake(ctx) wakes all waiters, so if there are multiple tasks
waiting, a timeout of one waiter will try to wake all of them.

I believe it's unintentional, but I don't think we care either. You
can save the waiter task struct and wake it specifically.


--
Pavel Begunkov




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux