On 27/10/2021 17:47, Jakub Kicinski wrote: > On Wed, 27 Oct 2021 17:38:27 +0200 Karsten Graul wrote: >> What we found out was that applications called sendmsg() with large data >> buffers using blocking sockets. This led to the described situation, were the >> solution was to early return to user space even if not all data were sent yet. >> Userspace applications should not have a problem with the fact that sendmsg() >> returns a smaller byte count than requested. >> >> Reverting this patch would bring back the stalled connection problem. > > I'm not sure. The man page for send says: > > When the message does not fit into the send buffer of the socket, > send() normally blocks, unless the socket has been placed in nonblock‐ > ing I/O mode. In nonblocking mode it would fail with the error EAGAIN > or EWOULDBLOCK in this case. > > dunno if that's required by POSIX or just a best practice. I see your point, and I am also not sure about how it should work in reality. The test case where the connection stalled is that both communication peers try to send data larger than there is space in the local send buffer plus the remote receive buffer. They use blocking sockets, so if the send() call is meant to send all data as requested then both sides would hang in send() forever/until a timeout. In our case both sides run a send/recv loop, so allowing send() to return lesser bytes then requested resulted in a follow-on recv() call which freed up space in the buffers, and the processing continues. There is also some discussion about this topic in this SO thread https://stackoverflow.com/questions/19697218/can-send-on-a-tcp-socket-return-0-and-length which points out that this (send returns smaller length) may happen already, e.g. when there is an interruption. So how to deal with all of this? Is it an accepted programming error when a user space program gets itself into this kind of situation? Since this problem depends on internal send/recv buffer sizes such a program might work on one system but not on other systems. At the end the question might be if either such kind of a 'deadlock' is acceptable, or if it is okay to have send() return lesser bytes than requested.