On 20.07.2022 12:30, Stefano Garzarella wrote: > On Wed, Jul 20, 2022 at 06:07:47AM +0000, Arseniy Krasnov wrote: >> On 19.07.2022 15:58, Stefano Garzarella wrote: >>> On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote: >>>> Hello, >>>> >>>> during my experiments with zerocopy receive, i found, that in some >>>> cases, poll() implementation violates POSIX: when socket has non- >>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and >>>> POLLRDNORM bits in 'revents' even number of bytes available to read >>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees >>>> POLLIN flag and then tries to read data(for example using 'read()' >>>> call), but read call will be blocked, because SO_RCVLOWAT logic is >>>> supported in dequeue loop in af_vsock.c. But the same time, POSIX >>>> requires that: >>>> >>>> "POLLIN Data other than high-priority data may be read without >>>> blocking. >>>> POLLRDNORM Normal data may be read without blocking." >>>> >>>> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293. >>>> >>>> So, we have, that poll() syscall returns POLLIN, but read call will >>>> be blocked. >>>> >>>> Also in man page socket(7) i found that: >>>> >>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a >>>> socket as readable only if at least SO_RCVLOWAT bytes are available." >>>> >>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it >>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with >>>> this case for TCP socket, it works as POSIX required. >>> >>> I tried to look at the code and it seems that only TCP complies with it or am I wrong? >> Yes, i checked AF_UNIX, it also don't care about that. It calls skb_queue_empty() that of >> course ignores SO_RCVLOWAT. >>> >>>> >>>> I've added some fixes to af_vsock.c and virtio_transport_common.c, >>>> test is also implemented. >>>> >>>> What do You think guys? >>> >>> Nice, thanks for fixing this and for the test! >>> >>> I left some comments, but I think the series is fine if we will support it in all transports. >> Ack >>> >>> I'd just like to understand if it's just TCP complying with it or I'm missing some check included in the socket layer that we could reuse. >> Seems sock_poll() which is socket layer entry point for poll() doesn't contain any such checks >>> >>> @David, @Jakub, @Paolo, any advice? >>> >>> Thanks, >>> Stefano >>> >> >> PS: moreover, i found one more interesting thing with TCP and poll: TCP receive logic wakes up poll waiter >> only when number of available bytes > SO_RCVLOWAT. E.g. it prevents "spurious" wake ups, when poll will be >> woken up because new data arrived, but POLLIN to allow user dequeue this data won't be set(as amount of data >> is too small). >> See tcp_data_ready() in net/ipv4/tcp_input.c > > Do you mean that we should call sk->sk_data_ready(sk) checking SO_RCVLOWAT? Yes, like tcp_data_read(). > > It seems fine, maybe we can add vsock_data_ready() in af_vsock.c that transports should call instead of calling sk->sk_data_ready(sk) directly. Yes, this will also update logic in vmci and hyperv transports > > Then we can something similar to tcp_data_ready(). > > Thanks, > Stefano >