Re: [PATCH] io_uring: IORING_OP_TIMEOUT support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2019-09-20 14:18:07 -0600, Jens Axboe wrote:
> On 9/20/19 10:53 AM, Andres Freund wrote:
> > Hi,
> > 
> > On 2019-09-17 10:03:58 -0600, Jens Axboe wrote:
> >> There's been a few requests for functionality similar to io_getevents()
> >> and epoll_wait(), where the user can specify a timeout for waiting on
> >> events. I deliberately did not add support for this through the system
> >> call initially to avoid overloading the args, but I can see that the use
> >> cases for this are valid.
> > 
> >> This adds support for IORING_OP_TIMEOUT. If a user wants to get woken
> >> when waiting for events, simply submit one of these timeout commands
> >> with your wait call. This ensures that the application sleeping on the
> >> CQ ring waiting for events will get woken. The timeout command is passed
> >> in a pointer to a struct timespec. Timeouts are relative.
> > 
> > Hm. This interface wouldn't allow to to reliably use a timeout waiting for
> > io_uring_enter(..., min_complete > 1, ING_ENTER_GETEVENTS, ...)
> > right?
> 
> I've got a (unpublished as of yet) version that allows you to wait for N
> events, and canceling the timer it met. So that does allow you to reliably
> wait for N events.

Cool.

I'm not quite sure how to parse "canceling the timer it met".

I assume you mean that one could ask for min_complete, and
IORING_OP_TIMEOUT would interrupt that wait, even if fewer than
min_complete have been collected?  It'd probably be good to return 0
instead of EINTR if at least one event is ready, otherwise it does seem
to make sense.


> > I can easily imagine usecases where I'd want to submit a bunch of ios
> > and wait for all of their completion to minimize unnecessary context
> > switches, as all IOs are required to continue. But with a relatively
> > small timeout, to allow switching to do other work etc.
> 
> The question is if it's worth it to add support for "wait for these N
> exact events", or whether "wait for N events" is enough. The application
> needs to read those completions anyway, and could then decide to loop
> if it's still missing some events. Downside is that it may mean more
> calls to wait, but since the io_uring is rarely shared, it might be
> just fine.

I think "wait for N events" is sufficient. I'm not even sure how one
could safely use "wait for these N exact events", or what precisely it
would mean.  All the usecases for min_complete that I can think of
basically just want to avoid unnecessary userspace transitions if not
enough work has been done to have a chance to finish its task - but if
there's plenty results other than the just submitted ones in the queue
that's also ok.


> , but since the io_uring is rarely shared, it might be just fine.

FWIW, I think we might want to share it between (forked) processes in
postgres, but I'm not sure yet (as in, in my current rough prototype I'm
not yet doing so). Without that it's a lot harder to really benefit from
the queue ordering operations, and sharing also allows to order queue
flushes to later parts of the journal, making it more likely that
connections COMMITing earlier also finish earlier.

Another, fairly crucial, reason is that being able to finish io requests
started by other backends would make it far easier to avoid deadlock
risk between postgres connections / background processes. Otherwise it's
fairly easy to encounter situations where backend A issues a few
prefetch requests and then blocks on some lock held by process B, and B
needs the one of the prefetchted buffers from A to finish IO. There's
more complex workarounds for this, but ...

Greetings,

Andres Freund



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux