Re: [PATCH net 1/4] Revert "net/smc: don't wait for send buffer space when data was already sent"

Karsten Graul <kgraul@xxxxxxxxxxxxx> · Tue, 2 Nov 2021 10:17:15 +0100

On 01/11/2021 08:04, Tony Lu wrote:
> On Thu, Oct 28, 2021 at 07:38:27AM -0700, Jakub Kicinski wrote:
>> On Thu, 28 Oct 2021 13:57:55 +0200 Karsten Graul wrote:
>>> So how to deal with all of this? Is it an accepted programming error
>>> when a user space program gets itself into this kind of situation?
>>> Since this problem depends on internal send/recv buffer sizes such a
>>> program might work on one system but not on other systems.
>>
>> It's a gray area so unless someone else has a strong opinion we can
>> leave it as is.
> 
> Things might be different. IMHO, the key point of this problem is to
> implement the "standard" POSIX socket API, or TCP-socket compatible API.
> 
>>> At the end the question might be if either such kind of a 'deadlock'
>>> is acceptable, or if it is okay to have send() return lesser bytes
>>> than requested.
>>
>> Yeah.. the thing is we have better APIs for applications to ask not to
>> block than we do for applications to block. If someone really wants to
>> wait for all data to come out for performance reasons they will
>> struggle to get that behavior. 
> 
> IMO, it is better to do something to unify this behavior. Some
> applications like netperf would be broken, and the people who want to use
> SMC to run basic benchmark, would be confused about this, and its
> compatibility with TCP. Maybe we could:
> 1) correct the behavior of netperf to check the rc as we discussed.
> 2) "copy" the behavior of TCP, and try to compatiable with TCP, though
> it is a gray area.

I have a strong opinion here, so when the question is if the user either
encounters a deadlock or if send() returns lesser bytes than requested,
I prefer the latter behavior.
The second case is much easier to debug for users, they can do something
to handle the problem (loop around send()), and this case can even be detected
using strace. But the deadlock case is nearly not debuggable by users and there
is nothing to prevent it when the workload pattern runs into this situation
(except to not use blocking sends).