Re: Change set based shallow clone

"Martin Langhoff" <martin.langhoff@xxxxxxxxx> · Fri, 8 Sep 2006 19:15:53 +1200

On 9/8/06, Junio C Hamano <junkio@xxxxxxx> wrote:
"Martin Langhoff" <martin.langhoff@xxxxxxxxx> writes:
> People who want shallow clones are actually asking for a "light"
> clone, in terms of "how much do I need to download". If everyone has
> the whole commit chain (but may be missing olden blobs, and even
> trees), the problem becomes a lot easier.

No, I do not think so.  You are just pushing the same problem to
another layer.

Reachability through commit ancestry chain is no more special
than reachability through commit-tree or tree-containment
relationships.  The grafts mechanism happen to treat commit

Agreed that it is no more special. OTOH, if we focus on the fact that
people want to avoid high-cost data transfers, transferring commit
chains is cheap and allows the client to ask good questions when
talking to the server.

So as far as tradeoffs go, it allows you to keep the protocol simple,
and minimise complex guessing at either end of the wire.

But let's touch a slightly different but related topic first.
People do not ask for shallow clones.  They just want faster
transfer of "everything they care about".  Shallow and lazy

I'd disagree a bit here. They care about the whole project, and in
time they'll find that out and end up pulling it all if they use git
much at all ;-)

They want fast, cheap initial checkouts.

...
So they are all not that different.

Earlier you were pointing out how hard it was for the client to even
know what to ask for because it can't see the whole picture. Having
the ancestry  complete means you always know what to ask for.

Now, first and foremost, while I would love to have a system
that gracefully operates with a "sparse" repository that lacks
objects that should exist from tag/commit/tree/blob reachability
point of view, it is an absolute requirement that I can tell why
objects are missing from a repository when I find some are
missing by running fsck-objects [*3*].

I agree -- and you can keep those objects you know are expected to be
missing listed in an "packless" idx file somewhere.

If repository is a
shallow clone, not having some object may be expected, but I
want to be able to tell repository corruption locally even in
that case,

+1

I talked about the need of upload-pack protocol extension

As far as I can see, this would not need any change to the upload-pack
protocol.

There are some hard problems in dealing with a sparse repo that need
thinking through. My thinking is that by having the whole commit chain
around the protocol can be kept sane, by virtue of the local repo
always having a clear "overall" picture, including knowing what it's
missing.

[*4*] In git, there is no inherent server vs client or upstream
vs downstream relationship between repositories.

Here an importaant distiction must be made. A "publishing" repo cannot
be sparse. A sparse repo probably cannot be cloned from.

 You may be
even fetching from many people and do not have a set upstream at
all.  Or you are _the_ upstream, and your notebook has the
latest devevelopment history, and after pushing that latest
history to your mothership repository, you may decide you do not
want ancient development history on a puny notebook, and locally
cauterize the history on your notebook repository and prune
ancient stuff.

Well, that's easy again: "prune old blobs and list them in an idx"
should work well.

cheers,

martin
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html