Am 14.08.22 um 16:13 schrieb Jens Axboe:
On 8/14/22 8:11 AM, Stefan Metzmacher wrote:
Hi Jens,
io_uring handles short sends/recvs for stream sockets when MSG_WAITALL
is set, however new zerocopy send is inconsistent in this regard, which
might be confusing. Handle short sends.
Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>
---
io_uring/net.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/io_uring/net.c b/io_uring/net.c
index 32fc3da04e41..f9f080b3cc1e 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -70,6 +70,7 @@ struct io_sendzc {
unsigned flags;
unsigned addr_len;
void __user *addr;
+ size_t done_io;
};
#define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
@@ -878,6 +879,7 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
zc->addr_len = READ_ONCE(sqe->addr_len);
+ zc->done_io = 0;
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
@@ -1012,11 +1014,23 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret < min_ret)) {
if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
return -EAGAIN;
- return ret == -ERESTARTSYS ? -EINTR : ret;
+ if (ret > 0 && io_net_retry(sock, msg.msg_flags)) {
+ zc->len -= ret;
+ zc->buf += ret;
+ zc->done_io += ret;
+ req->flags |= REQ_F_PARTIAL_IO;
Don't we need a prep_async function and/or something like
io_setup_async_msg() here to handle address?
I don't think so, it's a non-vectored interface, so all the state is
already in io_sendzc.
This has support for sockaddr address compared to io_send(),
if the caller need to keep io_sendzc->addr valid until the qce arrived,
then we need to clearly document that, as that doesn't match the common practice
of other opcodes. Currently everything but data buffers can go after the sqe is
submitted.
Good point, it's not just the 'from' address. Pavel?
It's basically dest_addr from:
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
const struct sockaddr *dest_addr, socklen_t addrlen);
It's not used in most cases, but for non-connected udp sockets you need it.
Maybe the fixed io_op_def.async_size could be changed to something that only
allocated the async data if needed. Maybe the prep_async() hook could to the allocation
itself if needed.
metze