Re: Feature request: Please implement IORING_OP_TEE

Clay Harris <bugs@xxxxxxxxxxx> · Mon, 27 Apr 2020 15:17:33 -0500

On Mon, Apr 27 2020 at 20:22:18 +0200, Jann Horn quoth thus:

> On Mon, Apr 27, 2020 at 5:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
> > On 4/27/20 9:40 AM, Clay Harris wrote:
> > > I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
> > > didn't go in at the same time.  It would be very useful to copy pipe
> > > buffers in an async program.
> >
> > Pavel, care to wire up tee? From a quick look, looks like just exposing
> > do_tee() and calling that, so should be trivial.
> 
> Just out of curiosity:
> 
> What's the purpose of doing that via io_uring? Non-blocking sys_tee()
> just shoves around some metadata, it doesn't do any I/O, right? Is
> this purely for syscall-batching reasons? (And does that mean that you
> would also add syscalls like epoll_wait() and futex() to io_uring?) Or
> is this because you're worried about blocking on the pipe mutex?

>From my perspective -- syscall-batching.

But, if you're going to be working with a very large number of file
descriptors, you'll need to have epoll().  You could do this by building
epoll_wait into io_uring and/or having a separate uring only for IO and
never waiting for completions there, but instead calling epoll() when
there are no ready cqe's.  I'd had assumed that this was already being
looked at because of the definition of IORING_OP_EPOLL_CTL.

----

So, I'd like to take this opportunity to bounce a related thought off
of all of you.  Even with the advent of io_uring, I think the approach
of handling a bunch of IO by marking all of the fds non-blocking and
using epoll() in edge-triggered mode is still valuable.

But, there is an impedance mismatch between splice() / tee() and using
epoll() this way.  (In fact, this applies to all requests that take
both an input and output fd.)  That is the request is working on two
fds, but returning only one status.  In the IO loop, we want to do
IO until we receive an EAGAIN and mark the fd as blocked.  We then
unblock it when epoll() says we can do IO again.  This doesn't work
well when we don't know which fd the EAGAIN was for.  So, we have
to issue a seperate poll() request on the involved fds to find out.

Logically, we'd like to get the status of both fds back from the
initial request, but that's not practical because once an error is
detected on one, the other is not further examined.

So, the idea is to introduce a new flag which could be passed to
any request that takes both an input and output fd.

If the flag is clear, errors are returned exactly as they are now.
If the flag is set, and the error occured with the output fd,
add 1 << 30 to the error number.

As it would be very rare for errors to concurrently be on both fds,
this would be practically as good as simultaneously getting the
status of both fds back.