On 02.08.2022 08:31, Vishnu Dasa wrote: > > >> On Jul 27, 2022, at 11:08 PM, Arseniy Krasnov <AVKrasnov@xxxxxxxxxxxxxx> wrote: >> >> On 27.07.2022 15:37, Stefano Garzarella wrote: >>> Hi Arseniy, >>> >>> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote: >>>> Hello, >>>> >>>> This patchset includes some updates for SO_RCVLOWAT: >>>> >>>> 1) af_vsock: >>>> During my experiments with zerocopy receive, i found, that in some >>>> cases, poll() implementation violates POSIX: when socket has non- >>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and >>>> POLLRDNORM bits in 'revents' even number of bytes available to read >>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees >>>> POLLIN flag and then tries to read data(for example using 'read()' >>>> call), but read call will be blocked, because SO_RCVLOWAT logic is >>>> supported in dequeue loop in af_vsock.c. But the same time, POSIX >>>> requires that: >>>> >>>> "POLLIN Data other than high-priority data may be read without >>>> blocking. >>>> POLLRDNORM Normal data may be read without blocking." >>>> >>>> See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&data=05%7C01%7Cvdasa%40vmware.com%7Cae83621d8709421de14b08da705faa9c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945853473740235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NrbycCcVXV9Tz8NRDYBpnDx7KpFF6BZpSRbuhz1IfJ4%3D&reserved=0, page 293. >>>> >>>> So, we have, that poll() syscall returns POLLIN, but read call will >>>> be blocked. >>>> >>>> Also in man page socket(7) i found that: >>>> >>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a >>>> socket as readable only if at least SO_RCVLOWAT bytes are available." >>>> >>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it >>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with >>>> this case for TCP socket, it works as POSIX required. >>>> >>>> I've added some fixes to af_vsock.c and virtio_transport_common.c, >>>> test is also implemented. >>>> >>>> 2) virtio/vsock: >>>> It adds some optimization to wake ups, when new data arrived. Now, >>>> SO_RCVLOWAT is considered before wake up sleepers who wait new data. >>>> There is no sense, to kick waiter, when number of available bytes >>>> in socket's queue < SO_RCVLOWAT, because if we wake up reader in >>>> this case, it will wait for SO_RCVLOWAT data anyway during dequeue, >>>> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such >>>> exit from poll() will be "spurious". This logic is also used in TCP >>>> sockets. >>> >>> Nice, it looks good! >> Thank You! >>> >>>> >>>> 3) vmci/vsock: >>>> Same as 2), but i'm not sure about this changes. Will be very good, >>>> to get comments from someone who knows this code. >>> >>> I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions. >>> >>> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and pv-drivers@xxxxxxxxxx) >> Ok, i'll CC them in the next version >>> >>>> >>>> 4) Hyper-V: >>>> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to >>>> support SO_RCVLOWAT, so he suggested to disable this feature for >>>> Hyper-V. >>> >>> I left a couple of comments in some patches, but it seems to me to be in a good state :-) >>> >>> I would just suggest a bit of a re-organization of the series (the patches are fine, just the order): >>> - introduce vsock_set_rcvlowat() >>> - disabling it for hv_sock >>> - use 'target' in virtio transports >>> - use 'target' in vmci transports >>> - use sock_rcvlowat in vsock_poll() >>> I think is better to pass sock_rcvlowat() as 'target' when the >>> transports are already able to use it >>> - add vsock_data_ready() >>> - use vsock_data_ready() in virtio transports >>> - use vsock_data_ready() in vmci transports >>> - tests >>> >>> What do you think? >> No problem! I think i can wait for reply from VMWare guys before preparing v3 > > Looks fine to me, especially the VMCI parts. Please send v3, and we can test it > from VMCI point of view as well. Great, thank you for reply. I'll prepare v3 ASAP and You will be CCed Thanks, Arseniy > > Thanks, > Vishnu