On 3/21/21 4:20 AM, Stefan Metzmacher wrote: > > Am 20.03.21 um 23:57 schrieb Jens Axboe: >> On 3/20/21 1:33 PM, Stefan Metzmacher wrote: >>> Without that it's not safe to use them in a linked combination with >>> others. >>> >>> Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE >>> should be possible. >>> >>> We already handle short reads and writes for the following opcodes: >>> >>> - IORING_OP_READV >>> - IORING_OP_READ_FIXED >>> - IORING_OP_READ >>> - IORING_OP_WRITEV >>> - IORING_OP_WRITE_FIXED >>> - IORING_OP_WRITE >>> - IORING_OP_SPLICE >>> - IORING_OP_TEE >>> >>> Now we have it for these as well: >>> >>> - IORING_OP_SENDMSG >>> - IORING_OP_SEND >>> - IORING_OP_RECVMSG >>> - IORING_OP_RECV >>> >>> For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC >>> flags in order to call req_set_fail_links(). >>> >>> There might be applications arround depending on the behavior >>> that even short send[msg]()/recv[msg]() retuns continue an >>> IOSQE_IO_LINK chain. >>> >>> It's very unlikely that such applications pass in MSG_WAITALL, >>> which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'. >>> >>> It's expected that the low level sock_sendmsg() call just ignores >>> MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set >>> SO_ZEROCOPY. >>> >>> We also expect the caller to know about the implicit truncation to >>> MAX_RW_COUNT, which we don't detect. >> >> Thanks, I do think this is much better and I feel comfortable getting >> htis applied for 5.12 (and stable). >> > > Great thanks! > > Related to that I have a questing regarding the IOSQE_IO_LINK behavior. > (Assuming I have a dedicated ring for the send-path of each socket.) > > Is it possible to just set IOSQE_IO_LINK on every sqe in order to create > an endless chain of requests so that userspace can pass as much sqes as possible > which all need to be submitted in the exact correct order. And if any request > is short, then all remaining get ECANCELED, without the risk of running any later > request out of order. > > Are such link chains possible also over multiple io_uring_submit() calls? > Is there still a race between, having an iothread removing the request from > from the list and fill in a cqe with ECANCELED, that userspace is not awaire > of yet, which then starts a new independed link chain with a request that > ought to be submitted after all the canceled once. > > Or do I have to submit a link chain with just a single __io_uring_flush_sq() > and then strictly need to wait until I got a cqe for the last request in > the chain? A chain can only exist within a single submit attempt, so it will not work if you need to break it up over multiple io_uring_enter() calls. -- Jens Axboe