Re: What's cooking in git.git (Aug 2008, #07; Sat, 23)

"Stephen R. van den Berg" <srb@xxxxxxx> · Mon, 25 Aug 2008 18:32:03 +0200

Junio C Hamano wrote:
>Junio C Hamano <gitster@xxxxxxxxx> writes:
>> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
>>>> * sb/daemon (Thu Aug 14 20:02:20 2008 +0200) 4 commits
>>>>  - git-daemon: rewrite kindergarden, new option --max-connections
>>>>  - git-daemon: Simplify dead-children reaping logic
>>>>  - git-daemon: use LOG_PID, simplify logging code
>>>>  - git-daemon: call logerror() instead of error()

>>>> Can somebody who actually runs the daemon standalone comment on this 
>>>> one?

>> Well, I didn't ask anybody to _run_ it.

Actually I *am* running it in production (obviously).

>It seems that kill_some_child() will not kill anything if nobody else is
>coming from the same address, while the old code did kill some.  Is this
>intended?

This is intended.
The reasoning goes a bit like this:
- We already allow a --timeout to be specified, that is taken care of
  by the child process.
- The child process is well-behaved, so it should honour its own timeout
  setting.
- If the daemon would need to cater for child processes failing to
  honour their own timeout setting, we have some serious bugs in the
  child-code, and *that* should be fixed ASAP instead of letting that
  kind of problems linger by not getting noticed because some daemon
  tries to clean up after badly-behaving children.
- Therefore, the only two problems that are left are:
  + Many connections in total.  That would mean that we can't handle the
    load.  Well, the best way to respond to that is to diligently
    process the ones that *did* manage to connect, and *not* to randomly
    disconnect sessions which are already halfway there.
  + Many connections in total, but also multiple connections from the
    same IP-address.  In this case, we could try and evenly distribute
    the resource and kill off the *newest* connections from the same IP
    addresses, to allow everyone an equal share to the daemon.

That means:
- Random killing is not necessary.
- Since we know exactly how many clients are running, we can keep tabs
  on when to fork and when not (further reducing the load problem on
  a busy site).  Which means that at most we need to
  kill one client for every new connection.
- Since we don't want to force old connections to die (we trust our
  children), we simply refuse further connections as soon as we've
  reached our limit and don't have resource-eaters (duplicate sessions
  from the same IP address).

>By the way, add_child() compares the whole "struct sockaddr_storage" in
>order to queue the newborn in front of an existing connection from the
>same address, and kill_some_child() takes advanrage of this to kill the
>newest connection ("We kill the newest" comment should probably be moved
>to add_child() to describe why the queuing is done this way).  If you
>simplify add_child() to queue the newborn always at the front of the list,
>your kill_some_child() will continue to do so, so I do not see the point
>of the loop in add_child().  Am I mistaken?

You are mistaken.
The point is that we need to find out *duplicate* IP-adresses.
I.e. when looking for the child to be killed, the IP-address we're looking
for is not the IP-address of the new connection, but we're looking for
two consecutive duplicate addresses.  In order for this to work, we need
to cluster the connections of the same IP-address, and the newest
connection needs to be inserted in front, in order for kill_child to
kill the most recent connection.

Shall I incorporate your suggested changes (as far as I consider them ok)
into my patches and resubmit them?
-- 
Sincerely,
           Stephen R. van den Berg.
Auto repair rates: basic labor $40/hour; if you wait, $60; if you watch, $80;
if you ask questions, $100; if you help, $120; if you laugh, $140.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html