On Sat, Oct 24, 2015 at 02:23:20PM +0200, René Scharfe wrote: > Call child_process_clear() when a child ends to release the memory > allocated for its environment. This is necessary because unlike all > other users of start_command() we don't call finish_command(), which > would have taken care of that for us. > > This leak was introduced by f063d38b (daemon: use cld->env_array > when re-spawning). I agree this is a leak we need to address, but I find the solution a little unsatisfactory (but tl;dr, it's probably the best we can do). It would be nice if we could call finish_command() here to do the wait(). But it does not know about WNOHANG, so we would have to introduce a new helper anyway. However, the whole check_dead_children function is accidentally quadratic, and should probably be refactored in general. It loops over the list of children, calling `waitpid(pid, WNOHANG)` on each. So if `n` children connect at one time, we make `O(n^2)` waitpid calls. I have patches to instead loop over waitpid(-1, WNOHANG) until there are no dead children left (and use a hash to go from pid to each "struct child"). But the reason I haven't posted them is that I'm somewhat of the opinion that this child-list can go away altogether. The main use is in enforcing the max_connections variable. Just enforcing that can be done with a counter, but we do something much weirder: when we get a new connection, we try to kill an _existing_ child by finding the first client with multiple connections, killing that, sleeping for a second (!), and then if that killed something, giving the slot to the new connection. This has a few problems: 1. Under non-malicious heavy load, you probably want to reject the new request rather than killing an existing one. If you kill a fetch that is 99% completed, that client is likely to come back and fetch again, and you've just thrown away all of the effort (both CPU and bandwidth) put into the failed clone. You haven't spent anything on the incoming connection, so you'd much rather they go away and come back in a moment. 2. I think the kill_some_child algorithm was meant to enforce fairness so that a single IP cannot monopolize all of your slots. I'm not sure that's realistic for a few reasons: a. Real bad guys will just hit you with a DDoS from many IPs. b. Real sites have load-balancing in front of git-daemon, and the client IPs may not be meaningful. c. The duplicate selection is pretty naive. A bad guy with tons of duplicate connections is no more likely to be killed than a normal client with exactly two connections (it actually depends solely on when the latest connection from either came in). So I suspect a determine attacker with a single IP could still present a problem. 3. Calling sleep() in the server parent process is a great way to kill performance. :) Of course my view is somewhat skewed by running a really big site with load balancers and such (and we have long set --max-connections high enough that it has never been hit). Maybe it's more realistic protection for a "normal" site. But my inclination would be to drop kill_some_child entirely in favor of simply rejecting new connections that are over the limit, and getting rid of the child-list entirely (and calling waitpid only to reap the zombie PIDs). I guess we'd still need something like child_process_clear(), though. We could just run it immediately after spawning the child, rather than when we cleaned it up. So in that sense this series is the first incremental step toward that future, anyway. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html