Re: Fetching SHA id's instead of named references?

Klas Lindberg <klas.lindberg@xxxxxxxxx> · Mon, 6 Apr 2009 18:22:15 +0200

On Mon, Apr 6, 2009 at 4:40 PM, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:

> The problem is, upload-pack won't perform a reachability analysis
> to determine if a wanted SHA1 is reachable from a current ref.
> Instead it requires that the wanted SHA1 is *exactly* referenced
> by at least one ref.

I probably just don't understand this properly, so please correct me
as needed. My understanding is that

 * git-fetch-pack looks at the local named reference to figure out the
SHA id "X" for the last locally available commit.
 * git-upload-pack is given "X" as a delimiter for what to include in
the pack to send back to git-fetch-pack.

So if I have "X" and I know which remote "Y" I want (because someone
told me, or it's in a manifest), why shouldn't I be able to let
git-upload-pack search for "X" from "Y" if that is exactly what it
does anyway for named references? I accept that it may fail because
"X" is not reachable from "Y" (just give me a sensible error message).

> There's no reason to perform the reachability test on the server
> when you can move it onto the client, and that's exactly what
> git-submodule is doing.  It fetches everything, and then assumes
> its reachable post fetch.  Since the client has fetched everything,
> the client has the object if its reachable by the server.

Except it will not always be available even when it was reachable at
the source. Here's the real world example that forced me to reject the
use of the submodule command for distributed setups:

 * Bob is located at site S where he sets up tree A with a submodule
B. He uses "submodule init" to initialize B, which will cause it to be
listed relative to S in A.
 * Lisa, at site T, clones A and updates the submodule B. No problem
so far. Her list of submodules is inherited from S and works for
updating B.
 * Lisa commits a new version of B and then a new version of A. Then
she asks Kent to merge her changes.
 * Kent's clone will also have a submodules list that refers to site S
(and not T). Running "submodule update" after fetching from T fails
even though all the material is available at T, because Git is then
trying to fetch the new revision of B from S.

If you try to work around this by not using "submodule init", then you
get a saner tree that can be worked on in a truly serverless fashion,
like with plain git trees, but you have to implement a CM tool on top.

> If the object is no longer reachable by the server's refs (think
> branch rebased) then the object is actually in danger of being GC'd
> off of the server's object store.

This is alright and I would make sure all the refs I want to keep are
reachable from named references to keep git-gc from chomping stuff in
my local tree.

In the remote tree, the unnamed reference is either available or it
isn't. If someone made an unnamed reference unreachable and then
garbage-collected it, well so be it. Just tell the user that the
reference can't be found and may in fact not exist at all and you're
done. No exhaustive search necessary.

> One way we get away with this sort of thing in repo is, we only
> put SHA1s in our manifest that are published in branches that
> won't ever rewind or delete.  Hence, its a moot point.

What is the syntax for that?

Anyway it's not a moot point. I may later want to use that revision of
the manifest to perform a checkout on every component listed by the
manifest. At that point I expect all the work trees to have exactly
the contents they "should" have for that old version of the manifest.
It's all about affordable reproducibility.

/Klas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html