Re: [Patch] Prevent cloning over http from spewing

Jeff King <peff@xxxxxxxx> · Wed, 3 Jun 2009 15:24:20 -0400

On Wed, Jun 03, 2009 at 12:15:55PM -0700, Shawn O. Pearce wrote:

> What we could do is try to organize the fetch queue by object type,
> get all commits, then all trees, then blobs.  The blobs are the
> bulk of the data, and by the time we hit them, we should be able
> to give some estimate on progress because we have all of the ones
> we need to fetch in our fetch queue.  But its only a "object count"
> sort of thing, not a byte count.

That's clever, and I think an "object count" would be fine (after all,
that is all that git:// fetching provides). However, I'm not sure how it
would work in practice. When we follow a walk to a commit in a pack, do
we really want to try to pull _just_ that commit?

For one thing, we would need the server to support partial fetches (and
it is my assumption that we don't bother with that at all now).  I don't
know how widespread that is these days (and of course we would still
need to fall back to fetching the full pack). But even if we _could_,
would we get killed by http protocol overhead for each object? Certainly
it would be no worse than fetching a totally unpacked repo, but I kind
of assume such a fetch would be painful.

Or given that the packs should be organized by type, are you proposing
to fetch just the "commit part" as a single entity, then "tree part",
then the "blob part"? I'm a little hesitant to rely too much on what is
basically a performance heuristic for the pack organization (and god
forbid packv4 ever gets finished ;) ).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html