On 05/11/2014 02:32 AM, Junio C Hamano wrote:
Sitaram Chamarty <sitaramc@xxxxxxxxx> writes:
Is there a trick to optimising a push by telling the receiver to pick up
missing objects from some other repo on its own server, to cut down even
more on network traffic?
So, hypothetically,
git push user@host:repo1 --look-for-objects-in=repo2
I'm aware of the alternates mechanism, but that makes the dependency on
the other repo sort-of permanent.
In the direction of fetching, this may be give a good starting point.
http://thread.gmane.org/gmane.comp.version-control.git/243918/focus=245397
That's an interesting thread and it's recent too. However, it's about
clone (though the intro email mentions other commands also).
I'm specifically interested in push efficiency right now. When you
"fork" someone's repo to your own space, and you push your fork to the
same server, it ought to be able to get most of the common objects from
disk (specifically, from the repo you forked), and only what extra you
did from the network.
Clones do have a workaround (clone with --reference, then repack, as you
said in that thread), but no such workaround exists for push.
In the direction of pushing, theoretically you could:
- define a new capability "look-for-objects-in" to pass the name of
the repository from "git push" to the "receive-pack";
- have "receive-pack" temporarily borrow from the named repository
(if the policy on the server side allows it), and accept the push;
- repack in order to dissociate the receiving repository from the
other repository it temporarily borrowed from.
which would be the natural inverse of the approach suggested in the
"Can I borrow just temporarily while cloning?" thread.
But I haven't thought things through with respect to what else need
to be modified to make sure this does not have adverse interaction
with simultaneous pushes into the same repository, which would make
it harder to solve for "receive-pack" than for "clone/fetch".
I'll leave it in your capable hands :-) My C coding days are long gone!
I do have a way to do this in gitolite (haven't coded it yet; just
thinking). Gitolite lets you specify something to do before git-*-pack
runs, and I was planning something like this:
terminology: borrow, borrower repo, reference repo
"borrow = relaxed" mode
1. check if the user has read access to the reference repo; skip
the rest of this if he doesn't
2. from reference repo's "objects", find all directories and
"mkdir" them into borrower's objects directory, then find all
files and "ln" (hardlink) them. This is presumably what "clone
-l" does.
This method is close to constant time since we're not copying
objects.
It has the potential issue that if an object existed in the
reference repo that was subsequently *deleted* (say, a commit that
contained a password, which was quickly overwritten when
discovered), and the attacker knows the SHA, he can get the commit
out by sending an commit that depends on it, then fetching it back.
(He could do that to the reference repo directly if he had write
access, but we'll assume he doesn't, so this *is* a possible
attack).
"borrow = strict" mode
1. (same as for "relaxed" mode)
2. actually *fetch* all refs from the reference repo to the
borrower (into, say, 'refs/borrowed'), then delete all those
refs so you just have the objects now.
Unlike the previous method, this takes time proportional to the
delta between borrower and reference, and may load the system a bit,
but unless the reference repo is highly volatile, this will settle
down. The point is that it cannot be used to get anything that the
user doesn't already have access to anyway.
I still have to try it, but it sounds like both these would work.
I'd appreciate any comments though...
regards
sitaram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html