On Fri, Nov 20, 2020 at 07:17:49AM +0900, Kuniyuki Iwashima wrote: > From: Martin KaFai Lau <kafai@xxxxxx> > Date: Wed, 18 Nov 2020 17:49:13 -0800 > > On Tue, Nov 17, 2020 at 06:40:15PM +0900, Kuniyuki Iwashima wrote: > > > The SO_REUSEPORT option allows sockets to listen on the same port and to > > > accept connections evenly. However, there is a defect in the current > > > implementation. When a SYN packet is received, the connection is tied to a > > > listening socket. Accordingly, when the listener is closed, in-flight > > > requests during the three-way handshake and child sockets in the accept > > > queue are dropped even if other listeners could accept such connections. > > > > > > This situation can happen when various server management tools restart > > > server (such as nginx) processes. For instance, when we change nginx > > > configurations and restart it, it spins up new workers that respect the new > > > configuration and closes all listeners on the old workers, resulting in > > > in-flight ACK of 3WHS is responded by RST. > > > > > > As a workaround for this issue, we can do connection draining by eBPF: > > > > > > 1. Before closing a listener, stop routing SYN packets to it. > > > 2. Wait enough time for requests to complete 3WHS. > > > 3. Accept connections until EAGAIN, then close the listener. > > > > > > Although this approach seems to work well, EAGAIN has nothing to do with > > > how many requests are still during 3WHS. Thus, we have to know the number > > It sounds like the application can already drain the established socket > > by accept()? To solve the problem that you have, > > does it mean migrating req_sk (the in-progress 3WHS) is enough? > > Ideally, the application needs to drain only the accepted sockets because > 3WHS and tying a connection to a listener are just kernel behaviour. Also, > there are some cases where we want to apply new configurations as soon as > possible such as replacing TLS certificates. > > It is possible to drain the established sockets by accept(), but the > sockets in the accept queue have not started application sessions yet. So, > if we do not drain such sockets (or if the kernel happened to select > another listener), we can (could) apply the new settings much earlier. > > Moreover, the established sockets may start long-standing connections so > that we cannot complete draining for a long time and may have to > force-close them (and they would have longer lifetime if they are migrated > to a new listener). > > > > Applications can already use the bpf prog to do (1) and divert > > the SYN to the newly started process. > > > > If the application cares about service disruption, > > it usually needs to drain the fd(s) that it already has and > > finishes serving the pending request (e.g. https) on them anyway. > > The time taking to finish those could already be longer than it takes > > to drain the accept queue or finish off the 3WHS in reasonable time. > > or the application that you have does not need to drain the fd(s) > > it already has and it can close them immediately? > > In the point of view of service disruption, I agree with you. > > However, I think that there are some situations where we want to apply new > configurations rather than to drain sockets with old configurations and > that if the kernel migrates sockets automatically, we can simplify user > programs. This configuration-update(/new-TLS-cert...etc) consideration will be useful if it is also included in the cover letter. It sounds like the service that you have is draining the existing already-accepted fd(s) which are using the old configuration. Those existing fd(s) could also be long life. Potentially those existing fd(s) will be in a much higher number than the to-be-accepted fd(s)? or you meant in some cases it wants to migrate to the new configuration ASAP (e.g. for security reason) even it has to close all the already-accepted fds() which are using the old configuration?? In either cases, considering the already-accepted fd(s) is usually in a much more number, does the to-be-accepted connection make any difference percentage-wise?