On 2/26/24 3:59 AM, Dylan Yudaken wrote: > On Sun, Feb 25, 2024 at 12:46?AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> If we have more data pending, we know we're going to do one more loop. >> If that's the case, then set MSG_MORE to inform the networking stack >> that there's more data coming shortly for this socket. >> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> --- >> io_uring/net.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/io_uring/net.c b/io_uring/net.c >> index 240b8eff1a78..07307dd5a077 100644 >> --- a/io_uring/net.c >> +++ b/io_uring/net.c >> @@ -519,6 +519,10 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags) >> if (!io_check_multishot(req, issue_flags)) >> return io_setup_async_msg(req, kmsg, issue_flags); >> >> + flags = sr->msg_flags; >> + if (issue_flags & IO_URING_F_NONBLOCK) >> + flags |= MSG_DONTWAIT; >> + >> retry_multishot: >> if (io_do_buffer_select(req)) { >> void __user *buf; >> @@ -528,12 +532,12 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags) >> if (!buf) >> return -ENOBUFS; >> >> + if ((req->flags & (REQ_F_BL_EMPTY|REQ_F_APOLL_MULTISHOT)) == >> + REQ_F_APOLL_MULTISHOT) >> + flags |= MSG_MORE; >> iov_iter_ubuf(&kmsg->msg.msg_iter, ITER_SOURCE, buf, len); >> } > > This feels racy. I don't have an exact sequence in mind, but I believe > there are cases where between > the two calls to __sys_sendmsg_sock, another submission could be > issued and drain the buffer list. > I guess the result would be that the packet is never sent out, but I > have not followed the codepaths of MSG_MORE. This is true, but that race always exists depending on how gets to go first (the adding of the buffer, or the send itself). The way I see it, when the send is issued we're making the guarantee that we're going to at least deplete the queue as it looks when entered. If more is added while it's being processed, we _may_ see it. Outside of that, we don't want it to potentially run in perpetuity. It may actually be a good idea to make the rule of "just issue what was there when first seen/issued" a hard one, though I don't think it's really worth doing. But making any guarantees on buffers added in parallel will be impossible. If you do that, then you have to deal with figuring out what's left in the queue once you get a completion withou CQE_F_MORE. > The obvious other way to trigger this codepath is if the user messes > with the ring by decrementing > the buffer counter. I do not believe there are any nefarious outcomes > - but just to point out that > REQ_F_BL_EMPTY is essentially user controlled. The user may certainly shoot himself in the foot. As long as that doesn't lead to a nefarious outcome, then that's not a concern. For this case, the head is kernel local, user can only write to the tail. So we could have a case of user fiddling with the tail and when we grab the next buffer (and the previous one did not have REQ_F_BL_EMPTY set), the ring will indeed appear to be empty. At that point you get an -ENOBUFS without CQE_F_MORE set. -- Jens Axboe