Re: [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL

Jens Axboe <axboe@xxxxxxxxx> · Sun, 21 Mar 2021 07:10:57 -0600

On 3/21/21 4:20 AM, Stefan Metzmacher wrote:
> 
> Am 20.03.21 um 23:57 schrieb Jens Axboe:
>> On 3/20/21 1:33 PM, Stefan Metzmacher wrote:
>>> Without that it's not safe to use them in a linked combination with
>>> others.
>>>
>>> Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
>>> should be possible.
>>>
>>> We already handle short reads and writes for the following opcodes:
>>>
>>> - IORING_OP_READV
>>> - IORING_OP_READ_FIXED
>>> - IORING_OP_READ
>>> - IORING_OP_WRITEV
>>> - IORING_OP_WRITE_FIXED
>>> - IORING_OP_WRITE
>>> - IORING_OP_SPLICE
>>> - IORING_OP_TEE
>>>
>>> Now we have it for these as well:
>>>
>>> - IORING_OP_SENDMSG
>>> - IORING_OP_SEND
>>> - IORING_OP_RECVMSG
>>> - IORING_OP_RECV
>>>
>>> For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
>>> flags in order to call req_set_fail_links().
>>>
>>> There might be applications arround depending on the behavior
>>> that even short send[msg]()/recv[msg]() retuns continue an
>>> IOSQE_IO_LINK chain.
>>>
>>> It's very unlikely that such applications pass in MSG_WAITALL,
>>> which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
>>>
>>> It's expected that the low level sock_sendmsg() call just ignores
>>> MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
>>> SO_ZEROCOPY.
>>>
>>> We also expect the caller to know about the implicit truncation to
>>> MAX_RW_COUNT, which we don't detect.
>>
>> Thanks, I do think this is much better and I feel comfortable getting
>> htis applied for 5.12 (and stable).
>>
> 
> Great thanks!
> 
> Related to that I have a questing regarding the IOSQE_IO_LINK behavior.
> (Assuming I have a dedicated ring for the send-path of each socket.)
> 
> Is it possible to just set IOSQE_IO_LINK on every sqe in order to create
> an endless chain of requests so that userspace can pass as much sqes as possible
> which all need to be submitted in the exact correct order. And if any request
> is short, then all remaining get ECANCELED, without the risk of running any later
> request out of order.
> 
> Are such link chains possible also over multiple io_uring_submit() calls?
> Is there still a race between, having an iothread removing the request from
> from the list and fill in a cqe with ECANCELED, that userspace is not awaire
> of yet, which then starts a new independed link chain with a request that
> ought to be submitted after all the canceled once.
> 
> Or do I have to submit a link chain with just a single __io_uring_flush_sq()
> and then strictly need to wait until I got a cqe for the last request in
> the chain?

A chain can only exist within a single submit attempt, so it will not work
if you need to break it up over multiple io_uring_enter() calls.

-- 
Jens Axboe