On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote:
Hello,
during my experiments with zerocopy receive, i found, that in some
cases, poll() implementation violates POSIX: when socket has non-
default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
POLLRDNORM bits in 'revents' even number of bytes available to read
on socket is smaller than SO_RCVLOWAT value. In this case,user sees
POLLIN flag and then tries to read data(for example using 'read()'
call), but read call will be blocked, because SO_RCVLOWAT logic is
supported in dequeue loop in af_vsock.c. But the same time, POSIX
requires that:
"POLLIN Data other than high-priority data may be read without
blocking.
POLLRDNORM Normal data may be read without blocking."
See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
So, we have, that poll() syscall returns POLLIN, but read call will
be blocked.
Also in man page socket(7) i found that:
"Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
socket as readable only if at least SO_RCVLOWAT bytes are available."
I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
this case for TCP socket, it works as POSIX required.
I tried to look at the code and it seems that only TCP complies with it
or am I wrong?
I've added some fixes to af_vsock.c and virtio_transport_common.c,
test is also implemented.
What do You think guys?
Nice, thanks for fixing this and for the test!
I left some comments, but I think the series is fine if we will support
it in all transports.
I'd just like to understand if it's just TCP complying with it or I'm
missing some check included in the socket layer that we could reuse.
@David, @Jakub, @Paolo, any advice?
Thanks,
Stefano