On 9/20/19 2:56 PM, Andres Freund wrote: > Hi, > > On 2019-09-20 14:18:07 -0600, Jens Axboe wrote: >> On 9/20/19 10:53 AM, Andres Freund wrote: >>> Hi, >>> >>> On 2019-09-17 10:03:58 -0600, Jens Axboe wrote: >>>> There's been a few requests for functionality similar to io_getevents() >>>> and epoll_wait(), where the user can specify a timeout for waiting on >>>> events. I deliberately did not add support for this through the system >>>> call initially to avoid overloading the args, but I can see that the use >>>> cases for this are valid. >>> >>>> This adds support for IORING_OP_TIMEOUT. If a user wants to get woken >>>> when waiting for events, simply submit one of these timeout commands >>>> with your wait call. This ensures that the application sleeping on the >>>> CQ ring waiting for events will get woken. The timeout command is passed >>>> in a pointer to a struct timespec. Timeouts are relative. >>> >>> Hm. This interface wouldn't allow to to reliably use a timeout waiting for >>> io_uring_enter(..., min_complete > 1, ING_ENTER_GETEVENTS, ...) >>> right? >> >> I've got a (unpublished as of yet) version that allows you to wait for N >> events, and canceling the timer it met. So that does allow you to reliably >> wait for N events. > > Cool. > > I'm not quite sure how to parse "canceling the timer it met". s/it/if > I assume you mean that one could ask for min_complete, and > IORING_OP_TIMEOUT would interrupt that wait, even if fewer than > min_complete have been collected? It'd probably be good to return 0 > instead of EINTR if at least one event is ready, otherwise it does seem > to make sense. Right, what I mean is if you ask for a timeout for N events, if N events pass before the timeout expires, then the timeout essentially does nothing. You'd still get a completion event, as with all SQEs, but it would not timeout and it'd happen right after that Nth event. The wait part always returns 0 if we have events, a potential error is only returned if the CQ ring is empty. That's the same as what we have now. But sounds like we are in violent agreement. I'll post a new patch for this soonish. >>> I can easily imagine usecases where I'd want to submit a bunch of ios >>> and wait for all of their completion to minimize unnecessary context >>> switches, as all IOs are required to continue. But with a relatively >>> small timeout, to allow switching to do other work etc. >> >> The question is if it's worth it to add support for "wait for these N >> exact events", or whether "wait for N events" is enough. The application >> needs to read those completions anyway, and could then decide to loop >> if it's still missing some events. Downside is that it may mean more >> calls to wait, but since the io_uring is rarely shared, it might be >> just fine. > > I think "wait for N events" is sufficient. I'm not even sure how one > could safely use "wait for these N exact events", or what precisely it > would mean. All the usecases for min_complete that I can think of > basically just want to avoid unnecessary userspace transitions if not > enough work has been done to have a chance to finish its task - but if > there's plenty results other than the just submitted ones in the queue > that's also ok. OK, that's exactly my thinking as well. You could wait for specific events, but you'd have to tag the events somehow to do that. I'd rather not add functionality like that unless absolutely necessary, especially since this kind of functionality could just be added to liburing if needed (or coded in the application itself). >> , but since the io_uring is rarely shared, it might be just fine. > > FWIW, I think we might want to share it between (forked) processes in > postgres, but I'm not sure yet (as in, in my current rough prototype I'm > not yet doing so). Without that it's a lot harder to really benefit from > the queue ordering operations, and sharing also allows to order queue > flushes to later parts of the journal, making it more likely that > connections COMMITing earlier also finish earlier. > > Another, fairly crucial, reason is that being able to finish io requests > started by other backends would make it far easier to avoid deadlock > risk between postgres connections / background processes. Otherwise it's > fairly easy to encounter situations where backend A issues a few > prefetch requests and then blocks on some lock held by process B, and B > needs the one of the prefetchted buffers from A to finish IO. There's > more complex workarounds for this, but ... Sharing is fine, as long as you mutually exclude reading the SQ and CQ rings, of course. And if it makes it easier to do for queue ordering, then by all means you should do that. -- Jens Axboe