On 9/8/06, Junio C Hamano <junkio@xxxxxxx> wrote:
"Martin Langhoff" <martin.langhoff@xxxxxxxxx> writes: > People who want shallow clones are actually asking for a "light" > clone, in terms of "how much do I need to download". If everyone has > the whole commit chain (but may be missing olden blobs, and even > trees), the problem becomes a lot easier. No, I do not think so. You are just pushing the same problem to another layer. Reachability through commit ancestry chain is no more special than reachability through commit-tree or tree-containment relationships. The grafts mechanism happen to treat commit
Agreed that it is no more special. OTOH, if we focus on the fact that people want to avoid high-cost data transfers, transferring commit chains is cheap and allows the client to ask good questions when talking to the server. So as far as tradeoffs go, it allows you to keep the protocol simple, and minimise complex guessing at either end of the wire.
But let's touch a slightly different but related topic first. People do not ask for shallow clones. They just want faster transfer of "everything they care about". Shallow and lazy
I'd disagree a bit here. They care about the whole project, and in time they'll find that out and end up pulling it all if they use git much at all ;-) They want fast, cheap initial checkouts. ...
So they are all not that different.
Earlier you were pointing out how hard it was for the client to even know what to ask for because it can't see the whole picture. Having the ancestry complete means you always know what to ask for.
Now, first and foremost, while I would love to have a system that gracefully operates with a "sparse" repository that lacks objects that should exist from tag/commit/tree/blob reachability point of view, it is an absolute requirement that I can tell why objects are missing from a repository when I find some are missing by running fsck-objects [*3*].
I agree -- and you can keep those objects you know are expected to be missing listed in an "packless" idx file somewhere.
If repository is a shallow clone, not having some object may be expected, but I want to be able to tell repository corruption locally even in that case,
+1
I talked about the need of upload-pack protocol extension
As far as I can see, this would not need any change to the upload-pack protocol. There are some hard problems in dealing with a sparse repo that need thinking through. My thinking is that by having the whole commit chain around the protocol can be kept sane, by virtue of the local repo always having a clear "overall" picture, including knowing what it's missing.
[*4*] In git, there is no inherent server vs client or upstream vs downstream relationship between repositories.
Here an importaant distiction must be made. A "publishing" repo cannot be sparse. A sparse repo probably cannot be cloned from.
You may be even fetching from many people and do not have a set upstream at all. Or you are _the_ upstream, and your notebook has the latest devevelopment history, and after pushing that latest history to your mothership repository, you may decide you do not want ancient development history on a puny notebook, and locally cauterize the history on your notebook repository and prune ancient stuff.
Well, that's easy again: "prune old blobs and list them in an idx" should work well. cheers, martin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html