> -----Original Message----- > From: Stefano Garzarella <sgarzare@xxxxxxxxxx> > Sent: Thursday, July 6, 2023 3:02 AM > To: Gary Guo <gary@xxxxxxxxxxx>; Dexuan Cui <decui@xxxxxxxxxxxxx> > Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang > <haiyangz@xxxxxxxxxxxxx>; Wei Liu <wei.liu@xxxxxxxxxx>; linux- > hyperv@xxxxxxxxxxxxxxx; virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx; > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: Hyper-V vsock streams do not fill the supplied buffer in full > > Hi Gary, > > On Wed, Jul 5, 2023 at 12:45 AM Gary Guo <gary@xxxxxxxxxxx> wrote: > > > > When a vsock stream is called with recvmsg with a buffer, it only fills > > the buffer with data from the first single VM packet. Even if there are > > more VM packets at the time and the buffer is still not completely > > filled, it will just leave the buffer partially filled. > > > > This causes some issues when in WSLD which uses the vsock in > > non-blocking mode and uses epoll. > > > > For stream-oriented sockets, the epoll man page [1] says that > > > > > For stream-oriented files (e.g., pipe, FIFO, stream socket), > > > the condition that the read/write I/O space is exhausted can > > > also be detected by checking the amount of data read from / > > > written to the target file descriptor. For example, if you > > > call read(2) by asking to read a certain amount of data and > > > read(2) returns a lower number of bytes, you can be sure of > > > having exhausted the read I/O space for the file descriptor. > > > > This has been used as an optimisation in the wild for reducing number > > of syscalls required for stream sockets (by asserting that the socket > > will not have to polled to EAGAIN in edge-trigger mode, if the buffer > > given to recvmsg is not filled completely). An example is Tokio, which > > starting in v1.21.0 [2]. > > > > When this optimisation combines with the behaviour of Hyper-V vsock, it > > causes issue in this scenario: > > * the VM host send data to the guest, and it's splitted into multiple > > VM packets > > * sk_data_ready is called and epoll returns, notifying the userspace > > that the socket is ready > > * userspace call recvmsg with a buffer, and it's partially filled > > * userspace assumes that the stream socket is depleted, and if new data > > arrives epoll will notify it again. > > * kernel always considers the socket to be ready, and since it's in > > edge-trigger mode, the epoll instance will never be notified again. > > > > This different realisation of the readiness causes the userspace to > > block forever. > > Thanks for the detailed description of the problem. > > I think we should fix the hvs_stream_dequeue() in > net/vmw_vsock/hyperv_transport.c. > We can do something similar to what we do in > virtio_transport_stream_do_dequeue() in > net/vmw_vsock/virtio_transport_common.c > > @Dexuan WDYT? > > Thanks, > Stefano (Sorry for the late response...) Thanks Gary Guo for the good analysis! I didn't realize that hvs_stream_dequeue() is supposed to copy as much data as possible to the userspace in the case of EPOLLET mode. Yes, I think we should fix hvs_stream_dequeue(). We'll try to get this fixed asap. Thanks, -- Dexuan