Re: optimising a push by fetching objects from nearby repos

Sitaram Chamarty <sitaramc@xxxxxxxxx> · Mon, 12 May 2014 07:20:59 +0530

On 05/11/2014 11:34 PM, Junio C Hamano wrote:
Sitaram Chamarty <sitaramc@xxxxxxxxx> writes:

But what I was looking for was validation from git.git folks of the idea
of replicating what "git clone -l" does, for an *existing* repo.

For example, I'm assuming that bringing in only the objects -- without
any of the refs pointing to them, making them all dangling objects --
will still allow the optimisation to occur (i.e., git will still say "oh
yeah I have these objects, even if they're dangling so I won't ask for
them from the pusher" and not "oh these are dangling objects; so I don't
recognise them from this perspective -- you'll have to send me those
again").

So here is an educated guess by a git.git folk.  I haven't read the
codepath for some time, so I may be missing some details:

  - The set of objects sent over the wire in "push" direction is
    determined by the receiving end listing what it has to the
    sending end, and then the sending end excluding what the
    receiving end told that it already has.

  - The receiving end tells the sending end what it has by showing
    the names of its refs and their values.

Having otherwise dangling objects in your object store alone will
not make them reachable from the refs shown to the sending end.  But
there is another trick the receiving end employes.

  - The receiving end also includes the refs and their values that
    appear in the repository it borrows objects from its alternate
    repositories, when it tells what objects it already has to the
    sending end.

So what you "assumed" is not entirely correct---bringing in only the
objects will not give you any optimization.

But because we infer from the location of the object store
(i.e. "objects" directory) where the refs that point at these
borrowed objects exist (i.e. in "../refs" relative to that "objects"
directory) in order to make sure that we do not have to say "oh
these are dangling but we know their history is not broken", we
still get the same optimisation.

Thanks!

Everything makes sense.  However, I'm not using the alternates
mechanism.

Since gitolite has the advantage of allowing me to do something before
and something after the git-receive-pack, I'm fetching all the refs into
a temporary namespace before, and deleting all of them after.  So, just
for the duration of the push, the refs do exist, and optimisation (of
network traffic) therefore happens.

In addition, since I check that the user has read access to the lender
repo (and don't do this optimisation if he does not), there is -- by
definition -- no security issue, in the sense that he cannot get
anything from the lender repo that he could not have got directly.

Thanks for all your help again, especially the very clear explanation!

regards
sitaram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html