Hi * evilchelu (Cristi Balan) pointed out on IRC that 'git fetch' takes a long time when fetching a history-disjoint repository. For example, mkdir test && cd test git init git remote add -f paperclip git://github.com/thoughtbot/paperclip.git git remote add -f hoptoad git://github.com/thoughtbot/hoptoad_notifier.git git remote add -f aasm git://github.com/rubyist/aasm.git git remote add -f forgot git://github.com/greenisus/forgot_password.git git remote add -f restful git://github.com/technoweenie/restful-authentication.git (taken straight from evilchelu's example, but it should be the same with random repositories). Peeking at the transmission with Wireshark, there is a noticeable pattern of 4-5s delays before _every_ "0008NAK\n" line sent by the server. Looking at upload-pack.c, it seems that the server does far too much work when processing the "have" lines. I've only just read into this area of code, but the rough idea seems to be: [404: get_common_commits()] for (H = every "have" line) { [321: got_sha1()] flag H and its parents (shallow!) as THEY_HAVE [415: get_common_commits()] if (we do not have H in our object store) { [367: ok_to_give_up()] for (W = every "want" object specified earlier) { [338: reachable()] walk the entire history to see if anything flagged THEY_HAVE so far is reachable from W } [418: get_common_commits()] if the innermost test succeeded for all W: ACK this H so the client stops walking history from it } } The entire loop seems to have O(h*w*n) (n=history) complexity, which probably is to blame for the delays. * Isn't this ok_to_give_up() test moot? If H is not in our object store, it cannot be of any use in the transfer (of our history to the client). So if we are going to fake an ACK to stop the client digging on this side of his history, we might as well send it right away. (And what does reaching a THEY_HAVE commit from all W's have to do with it?) * Even assuming it is not, it would save some work if the server avoided walking the entire history for every H. For example, it could buffer up all H's until a "0000\n" arrives, which currently seems to be 32 haves, then check all of them in a single pass over history. (Unfortunately the 'h' factor cannot be removed completely unless we buffer all of them, which again defeats the point.) Much of the code in question was added in 937a515 (upload-pack: stop the other side when they have more roots than we do., 2006-07-05). [I hope I'm making some sense, it's far too late here, but it was either this or trying to understand it again in the morning.] - Thomas -- Thomas Rast trast@xxxxxxxxxxxxxxx
Attachment:
signature.asc
Description: This is a digitally signed message part.