Re: [PATCH net 1/4] Revert "net/smc: don't wait for send buffer space when data was already sent"

Karsten Graul <kgraul@xxxxxxxxxxxxx> · Thu, 28 Oct 2021 13:57:55 +0200

On 27/10/2021 17:47, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 17:38:27 +0200 Karsten Graul wrote:
>> What we found out was that applications called sendmsg() with large data
>> buffers using blocking sockets. This led to the described situation, were the
>> solution was to early return to user space even if not all data were sent yet.
>> Userspace applications should not have a problem with the fact that sendmsg()
>> returns a smaller byte count than requested.
>>
>> Reverting this patch would bring back the stalled connection problem.
> 
> I'm not sure. The man page for send says:
> 
>        When the message does not fit into  the  send  buffer  of  the  socket,
>        send()  normally blocks, unless the socket has been placed in nonblock‐
>        ing I/O mode.  In nonblocking mode it would fail with the error  EAGAIN
>        or  EWOULDBLOCK in this case.
> 
> dunno if that's required by POSIX or just a best practice.

I see your point, and I am also not sure about how it should work in reality.

The test case where the connection stalled is that both communication peers try
to send data larger than there is space in the local send buffer plus the remote 
receive buffer. They use blocking sockets, so if the send() call is meant to send
all data as requested then both sides would hang in send() forever/until a timeout.

In our case both sides run a send/recv loop, so allowing send() to return lesser 
bytes then requested resulted in a follow-on recv() call which freed up space in the
buffers, and the processing continues.

There is also some discussion about this topic in this SO thread
    https://stackoverflow.com/questions/19697218/can-send-on-a-tcp-socket-return-0-and-length
which points out that this (send returns smaller length) may happen already, e.g. when there
is an interruption.

So how to deal with all of this? Is it an accepted programming error when a user space 
program gets itself into this kind of situation? Since this problem depends on 
internal send/recv buffer sizes such a program might work on one system but not on 
other systems.

At the end the question might be if either such kind of a 'deadlock' is acceptable, 
or if it is okay to have send() return lesser bytes than requested.