Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 10, 2016 at 01:22:07PM -0800, Jonathan Nieder wrote:

> > I am not quite sure if that is an advantage, though.  The second
> > message proposes that the lost-found computation to be done by the
> > client using *.pack, but any client, given the same *.pack, will
> > compute the same result, so if the result is computed on the server
> > side just once when the *.pack is prepared and downloaded to the
> > client, it would give us a better overall resource utilization.  And
> > in essence, that was what the *.info file in the first message was.
> 
> Advantages of not providing the list of roots:
>  1. only need one round-trip to serve the packfile as-is
>  2. less data sent over the wire (not important unless the list of roots
>     is long)
>  3. can be enabled on the server for existing repositories without an
>     extra step of generating .info files
> 
> Advantage of providing the list of roots:
> - speedup because the client does not have to compute the list of roots
> 
> For a client that is already iterating over all objects and inspecting
> FLAG_LINK, the advantage (3) seems compelling enough to prefer the
> protocol that doesn't sent a list of roots.

I'm not sure how compelling (3) is, since we are relying on the server
to make certain packing choices. I guess a stock "git repack -ad" would
do in a pinch; it should at least contain all needed objects, but it's
going to potentially have extra cruft objects (from reflogs, for
example).

I outlined some alternatives to Shawn's proposal elsewhere in the
thread. I think it's a useful feature for this redirect to not just be
"go fetch this packfile", but "go clone from here and come back to me".
That opens up a lot of flexibility.

It does make "go fetch this packfile without roots" a little harder, but
I think it's still do-able. Right now when git hits an http URL, we pass
the smart-http "?service=" magic, and we look at the response to figure
out whether we got:

  1. A smart-http server.

  2. A dumb-http server.

  N. Something else, in which case we die.

The alternative I outlined elsewhere (and the patches I posted long
ago) basically adds:

  3. If it's a bundle, fetch the bundle and then clone from that.

But we could also do:

  4. If it's a packfile, fetch the packfile and then do the
     find-the-roots magic.

> Except when people pass --depth, "git clone" sets
> 'check_self_contained_and_connected = 1'.  That means clients that
> already iterate over all objects and inspect FLAG_LINK are the usual
> case.

Somewhat related, but I've wondered if we could do something similar
even for non-clone cases. That is, `index-pack` could tell us the set of
referenced but missing objects, and we could verify that each of those
is reachable (we _could_ just have it verify that we have the object at
all; traditionally we only guaranteed that reachable objects were kept,
but these days we keep anything reachable from another object we are
keeping, so if you have X, you should always have X^, etc).

Anyway, that's quite a tangent from this topic.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]