Re: Pack transfer negotiation to tree and blob level?

Duy Nguyen <pclouds@xxxxxxxxx> · Thu, 28 Nov 2013 06:50:05 +0700

On Thu, Nov 28, 2013 at 5:52 AM, Philip Oakley <philipoakley@xxxxxxx> wrote:
> In the pack transfer protocol (Documentation\technical\pack-protocol.txt)
> the negotiation for refs is discussed, but its unclear to me if the
> negotiation explicitly navigates down into the trees and blobs of each
> commit that need to go into the pack.
>
> From one perspective I can see that, in the main, it's only commit objects
> that are being negotiated, and the DAG is used to imply which commit objects
> are to be sent between the wants and haves end points, without need to
> descend into their trees and blobs. The tags and the objects they point to
> are explicitly given so are negotiated easily.
>
> The other view is that the negotiation should be listing every object of any
> type between the wants and haves as part of the negotiation. I just couldn't
> tell from the docs which assumption is appropriate. Is there any extra
> clarifications on this?

other object negotiation is inferred from commits because sending full
listing is too much. If you say you have commit A, you imply you have
everything from commit A down to the bottom. With this knowledge, when
you want commit B, the sender only needs to send trees and objects
that do not exist in commit A or any of its predecessors. Although to
cut cost at the sender, we do something less than optimized (check out
the edge concept in documents, or else in pack-objects.c). Pack
bitmaps are supposed to provide cheap object traversal and make the
transfered pack even smaller.

> I ask as I was cogitating on options for a 'narrow' clone  (to complement
> shallow clones ;-) that could, say, in some way limit the size of blobs
> downloaded, or the number of tree levels downloaded, or even path limiting.

size limiting is easy because you don't need to traverse object dag at
all. Inside pack-objects it calls rev-list to collect objects to be
sent. You just filter by size at that phase. Support for raising or
lowering size limit is also workable, just like how shallow
deepen/shorten is done: you let the sender know you have size limit A,
now you want to raise to B and the sender just collects extra objects
in A..B range for all "have" refs.

The problem is how to let the client know what objects are not sent
due to the size limit, so it could set up refs/replace to stop the
user from running into missing objects. If there are too many excluded
objects, sending all those SHA-1 with pkt-line is inefficient. (path
limit does not have problem, it can infer from the command line
arguments most of the time). Maybe you could send this listing in
binary format just before sending the pack.

BTW another way to deal with large blobs in clone is git-annex. I was
thinking the other day if we could sort of integrate it to git to
provide smooth UI (the user does not have to type "git annex
something", or at least not often). Of course git-annex is still
optional and the UI integration is only activated via config key,
after git-annex is installed.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html