From: Eric Dumazet <eric.dumazet@xxxxxxxxx> Date: Wed, 18 Nov 2020 17:25:44 +0100 > On 11/17/20 10:40 AM, Kuniyuki Iwashima wrote: > > The SO_REUSEPORT option allows sockets to listen on the same port and to > > accept connections evenly. However, there is a defect in the current > > implementation. When a SYN packet is received, the connection is tied to a > > listening socket. Accordingly, when the listener is closed, in-flight > > requests during the three-way handshake and child sockets in the accept > > queue are dropped even if other listeners could accept such connections. > > > > This situation can happen when various server management tools restart > > server (such as nginx) processes. For instance, when we change nginx > > configurations and restart it, it spins up new workers that respect the new > > configuration and closes all listeners on the old workers, resulting in > > in-flight ACK of 3WHS is responded by RST. > > > > I know some programs are simply removing a listener from the group, > so that they no longer handle new SYN packets, > and wait until all timers or 3WHS have completed before closing them. > > They pass fd of newly accepted children to more recent programs using af_unix fd passing, > while in this draining mode. Just out of curiosity, can I know the software for more study? > Quite frankly, mixing eBPF in the picture is distracting. I agree. Also, I think eBPF itself is not always necessary in many cases and want to make user programs simpler with this patchset. The SO_REUSEPORT implementation is excellent to improve the scalability. On the other hand, as a trade-off, users have to know deeply how the kernel handles SYN packets and to implement connection draining by eBPF. > It seems you want some way to transfer request sockets (and/or not yet accepted established ones) > from fd1 to fd2, isn't it something that should be discussed independently ? I understand that you are asking that I should discuss the issue and how to transfer sockets independently. Please correct me if I have misunderstood your question. The kernel handles 3WHS and users cannot know its existence (without eBPF). Many users believe SO_REUSEPORT should make it possible to distribute all connections across available listeners ideally, but actually, there are possibly some connections aborted silently. Some user may think that if the kernel selected other listeners, the connections would not be dropped. The root cause is within the kernel, so the issue should be addressed in the kernel space and should not be visible to userspace. In order not to make users bother with implementing new some stuff, I want to fix the root cause by transferring sockets automatically so that users need not take care of kernel implementation and connection draining. Moreover, if possible, I did not want to mix eBPF with the issue. But there may be some cases that different applications listen on the same port and eBPF routes packets to each by some rules. In such cases, redistributing sockets without user intention will break the application. This patchset will work in many cases, but to care such cases, I added the eBPF part.