> -----Original Message----- > From: Jeff King > Sent: Thursday, September 12, 2013 3:57 PM > > On Thu, Sep 12, 2013 at 12:45:44PM +0000, Pyeron, Jason J CTR (US) > wrote: > > > If the rules of engagement are change a bit, the server side can be > release from most of its work (CPU/IO). > > > > Client does the following, looping as needed: > > > > Heads=server->heads(); > > KnownCommits=Local->AllCommits(); > > Missingblobs=[]; > > Foreach(commit:heads) if (!knownCommits->contains(commit)) > MissingBlobs[]=commit; > > Foreach(commit:knownCommit) if (!commit->isValid()) > MissingBlobs[]=commit->blobs(); > > If (missingBlobs->size()>0) server->FetchBlobs(missingBlobs); > > That doesn't quite work. The client does not know the set of missing "looping as needed" > objects just from the commits. It knows the sha1 of the root trees it > is > missing. And then if it fetches those, it knows the sha1 of any > top-level entries it is missing. And when it gets those, it knows the > sha1 of any 2nd-level entries it is missing, and so forth. > > You can progressively ask for each level, but: > > 1. You are spending a round-trip for each request. Doing it per- > object > is awful (the dumb http walker will do this if the repo is not > packed, and it's S-L-O-W). Doing it per-level would be better, but > not great. Yes, but it is those awfully slow connections (slower that the looping issue) which happen to always drop while cloning from our office. And the round trip should be mitigated by http-keep-alives. > > 2. You are losing opportunities for deltas (or you are making the > state the server needs to maintain very complicated, as it must > remember from request to request which objects you have gotten > that > can be used as delta bases). But, again if the connection drops, we have already lost the delta advantage. I would think the scenario would go like this: git clone url://blah/blah [fail] cd blah git clone --resume #uses normal methods.... [fail] while ! git clone --resume --HitItWithAStick replace clone with fetch for that use case too > > 3. There is a lot of overhead in this protocol. The client has to > mention each object individually by sha1. It may not seem like a > lot, but it can easily add 10% to a clone (just look at the size > of > the pack .idx files versus the packfiles themselves). But if it finishes in a week, it is a lot better than it never, ever finishes. I draw attention to the time I had to download DB2 UDB in Thailand via 28k modem. It was resumable with wget, if it were not, it would have required the use of a plane to sneaker net it back.
<<attachment: smime.p7s>>