On Tue, Nov 02, 2021 at 10:17:15AM +0100, Karsten Graul wrote: > On 01/11/2021 08:04, Tony Lu wrote: > > On Thu, Oct 28, 2021 at 07:38:27AM -0700, Jakub Kicinski wrote: > >> On Thu, 28 Oct 2021 13:57:55 +0200 Karsten Graul wrote: > >>> So how to deal with all of this? Is it an accepted programming error > >>> when a user space program gets itself into this kind of situation? > >>> Since this problem depends on internal send/recv buffer sizes such a > >>> program might work on one system but not on other systems. > >> > >> It's a gray area so unless someone else has a strong opinion we can > >> leave it as is. > > > > Things might be different. IMHO, the key point of this problem is to > > implement the "standard" POSIX socket API, or TCP-socket compatible API. > > > >>> At the end the question might be if either such kind of a 'deadlock' > >>> is acceptable, or if it is okay to have send() return lesser bytes > >>> than requested. > >> > >> Yeah.. the thing is we have better APIs for applications to ask not to > >> block than we do for applications to block. If someone really wants to > >> wait for all data to come out for performance reasons they will > >> struggle to get that behavior. > > > > IMO, it is better to do something to unify this behavior. Some > > applications like netperf would be broken, and the people who want to use > > SMC to run basic benchmark, would be confused about this, and its > > compatibility with TCP. Maybe we could: > > 1) correct the behavior of netperf to check the rc as we discussed. > > 2) "copy" the behavior of TCP, and try to compatiable with TCP, though > > it is a gray area. > > I have a strong opinion here, so when the question is if the user either > encounters a deadlock or if send() returns lesser bytes than requested, > I prefer the latter behavior. > The second case is much easier to debug for users, they can do something > to handle the problem (loop around send()), and this case can even be detected > using strace. But the deadlock case is nearly not debuggable by users and there > is nothing to prevent it when the workload pattern runs into this situation > (except to not use blocking sends). I agree with you. I am curious about this deadlock scene. If it was convenient, could you provide a reproducible test case? We are also setting up a SMC CI/CD system to find the compatible and performance fallback problems. Maybe we could do something to make it better. Cheers, Tony Lu