On 25.02.2021 19:27, Jorgen Hansen wrote: > On 18 Feb 2021, at 06:37, Arseny Krasnov <arseny.krasnov@xxxxxxxxxxxxx> wrote: >> This adds receive loop for SEQPACKET. It looks like receive loop for >> STREAM, but there is a little bit difference: >> 1) It doesn't call notify callbacks. >> 2) It doesn't care about 'SO_SNDLOWAT' and 'SO_RCVLOWAT' values, because >> there is no sense for these values in SEQPACKET case. >> 3) It waits until whole record is received or error is found during >> receiving. >> 4) It processes and sets 'MSG_TRUNC' flag. >> >> So to avoid extra conditions for two types of socket inside one loop, two >> independent functions were created. >> >> Signed-off-by: Arseny Krasnov <arseny.krasnov@xxxxxxxxxxxxx> >> --- >> include/net/af_vsock.h | 5 +++ >> net/vmw_vsock/af_vsock.c | 97 +++++++++++++++++++++++++++++++++++++++- >> 2 files changed, 101 insertions(+), 1 deletion(-) >> >> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h >> index b1c717286993..01563338cc03 100644 >> --- a/include/net/af_vsock.h >> +++ b/include/net/af_vsock.h >> @@ -135,6 +135,11 @@ struct vsock_transport { >> bool (*stream_is_active)(struct vsock_sock *); >> bool (*stream_allow)(u32 cid, u32 port); >> >> + /* SEQ_PACKET. */ >> + size_t (*seqpacket_seq_get_len)(struct vsock_sock *vsk); >> + int (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg, >> + int flags, bool *msg_ready); >> + >> /* Notification. */ >> int (*notify_poll_in)(struct vsock_sock *, size_t, bool *); >> int (*notify_poll_out)(struct vsock_sock *, size_t, bool *); >> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c >> index d277dc1cdbdf..b754927a556a 100644 >> --- a/net/vmw_vsock/af_vsock.c >> +++ b/net/vmw_vsock/af_vsock.c >> @@ -1972,6 +1972,98 @@ static int __vsock_stream_recvmsg(struct sock *sk, struct msghdr *msg, >> return err; >> } >> >> +static int __vsock_seqpacket_recvmsg(struct sock *sk, struct msghdr *msg, >> + size_t len, int flags) >> +{ >> + const struct vsock_transport *transport; >> + const struct iovec *orig_iov; >> + unsigned long orig_nr_segs; >> + bool msg_ready; >> + struct vsock_sock *vsk; >> + size_t record_len; >> + long timeout; >> + int err = 0; >> + DEFINE_WAIT(wait); >> + >> + vsk = vsock_sk(sk); >> + transport = vsk->transport; >> + >> + timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); >> + orig_nr_segs = msg->msg_iter.nr_segs; >> + orig_iov = msg->msg_iter.iov; >> + msg_ready = false; >> + record_len = 0; >> + >> + while (1) { >> + err = vsock_wait_data(sk, &wait, timeout, NULL, 0); >> + >> + if (err <= 0) { >> + /* In case of any loop break(timeout, signal >> + * interrupt or shutdown), we report user that >> + * nothing was copied. >> + */ >> + err = 0; >> + break; >> + } >> + >> + if (record_len == 0) { >> + record_len = >> + transport->seqpacket_seq_get_len(vsk); >> + >> + if (record_len == 0) >> + continue; >> + } >> + >> + err = transport->seqpacket_dequeue(vsk, msg, >> + flags, &msg_ready); >> + >> + if (err < 0) { >> + if (err == -EAGAIN) { >> + iov_iter_init(&msg->msg_iter, READ, >> + orig_iov, orig_nr_segs, >> + len); >> + /* Clear 'MSG_EOR' here, because dequeue >> + * callback above set it again if it was >> + * set by sender. This 'MSG_EOR' is from >> + * dropped record. >> + */ >> + msg->msg_flags &= ~MSG_EOR; >> + record_len = 0; >> + continue; >> + } > So a question for my understanding of the flow here. SOCK_SEQPACKET is reliable, so > what does it mean to drop the record? Is the transport supposed to roll back to the > beginning of the current record? If the incoming data in the transport doesn’t follow > the protocol, and packets need to be dropped, shouldn’t the socket be reset or similar? > Maybe there is potential for simplifying the flow if that is the case. As vhost transport could drop some packets(for example when kmalloc failed), in this case user will see part of record(when RW packet was dropped), or it will be impossible to distinguish two records(when END of first and BEGIN of second were missed). So in this case user continues to sleep and such orphaned packets will be dropped. Yes, it will simplify logic a lot, if i'll just send connection reset when invalid sequence of packets were detected. > >> + >> + err = -ENOMEM; >> + break; >> + } >> + >> + if (msg_ready) >> + break; >> + } >> + >> + if (sk->sk_err) >> + err = -sk->sk_err; >> + else if (sk->sk_shutdown & RCV_SHUTDOWN) >> + err = 0; >> + >> + if (msg_ready) { >> + /* User sets MSG_TRUNC, so return real length of >> + * packet. >> + */ >> + if (flags & MSG_TRUNC) >> + err = record_len; >> + else >> + err = len - msg->msg_iter.count; >> + >> + /* Always set MSG_TRUNC if real length of packet is >> + * bigger than user's buffer. >> + */ >> + if (record_len > len) >> + msg->msg_flags |= MSG_TRUNC; >> + } >> + >> + return err; >> +} >> + >> static int >> vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, >> int flags) >> @@ -2027,7 +2119,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, >> goto out; >> } >> >> - err = __vsock_stream_recvmsg(sk, msg, len, flags); >> + if (sk->sk_type == SOCK_STREAM) >> + err = __vsock_stream_recvmsg(sk, msg, len, flags); >> + else >> + err = __vsock_seqpacket_recvmsg(sk, msg, len, flags); >> >> out: >> release_sock(sk); >> -- >> 2.25.1 >>