O(#haves*...) behaviour in "have <sha>" processing in upload-pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi *

evilchelu (Cristi Balan) pointed out on IRC that 'git fetch' takes a
long time when fetching a history-disjoint repository.

For example,

  mkdir test && cd test
  git init
  git remote add -f paperclip git://github.com/thoughtbot/paperclip.git
  git remote add -f hoptoad git://github.com/thoughtbot/hoptoad_notifier.git
  git remote add -f aasm git://github.com/rubyist/aasm.git
  git remote add -f forgot git://github.com/greenisus/forgot_password.git
  git remote add -f restful git://github.com/technoweenie/restful-authentication.git

(taken straight from evilchelu's example, but it should be the same
with random repositories).

Peeking at the transmission with Wireshark, there is a noticeable
pattern of 4-5s delays before _every_ "0008NAK\n" line sent by the
server.

Looking at upload-pack.c, it seems that the server does far too much
work when processing the "have" lines.  I've only just read into this
area of code, but the rough idea seems to be:

  [404: get_common_commits()]
  for (H = every "have" line) {

      [321: got_sha1()]
      flag H and its parents (shallow!) as THEY_HAVE

      [415: get_common_commits()]
      if (we do not have H in our object store) {

          [367: ok_to_give_up()]
          for (W = every "want" object specified earlier) {

              [338: reachable()]
              walk the entire history to see if anything flagged
              THEY_HAVE so far is reachable from W

          }

          [418: get_common_commits()]
          if the innermost test succeeded for all W: ACK this H so the
          client stops walking history from it
      }
  }

The entire loop seems to have O(h*w*n) (n=history) complexity, which
probably is to blame for the delays.

* Isn't this ok_to_give_up() test moot?  If H is not in our object
  store, it cannot be of any use in the transfer (of our history to
  the client).  So if we are going to fake an ACK to stop the client
  digging on this side of his history, we might as well send it right
  away.  (And what does reaching a THEY_HAVE commit from all W's have
  to do with it?)

* Even assuming it is not, it would save some work if the server
  avoided walking the entire history for every H.  For example, it
  could buffer up all H's until a "0000\n" arrives, which currently
  seems to be 32 haves, then check all of them in a single pass over
  history.  (Unfortunately the 'h' factor cannot be removed completely
  unless we buffer all of them, which again defeats the point.)

Much of the code in question was added in 937a515 (upload-pack: stop
the other side when they have more roots than we do., 2006-07-05).

[I hope I'm making some sense, it's far too late here, but it was
either this or trying to understand it again in the morning.]

- Thomas

-- 
Thomas Rast
trast@xxxxxxxxxxxxxxx




Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux