Re: git pack/unpack over bittorrent - works!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/04/10 08:13, Nicolas Pitre wrote:
> On Sat, 4 Sep 2010, Artur Skawina wrote:
>> What I'm really asking is, if a (modified) git-upload-pack skips transferring
>> commit X, and just sends me commit Z (possibly as delta vs 'X'), _and_ I 
>> obtain commit 'X" in some other way, I will be able to reconstruct 'Z', correct?
> 
> Yes.  Although it is 'git pack-objects' that decides what objects to 
> send, not 'git-upload-pack'.

Thank you very much for the detailed answers.

AFAIU both previously mentioned assumptions hold, so here's an example of
git-p2p-v3 use, simplified and with most boring stuff (p2p,ref and error
handling omitted.
(the first version made a canonical, shared, virtual representation of the
object store, the second added more git-awareness to the transport, and then
I started wondering if all of that is actually necessary; hence...).

Let's say I'm a git repo tracking Linus' tree, right now the newest commit that
i have is "v2.6.33" (but it could be anything, including "" for a fresh, empty
clone) and I want to become up to date.

1) I fetch a list of IPs of well known seeds, eg from kernel.org.

2) I send an UDP packet to some of them, containing the repo 
   ("git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git"),
   the ref that I'm interested in ("master") and the hash of the last commit
   that I have ("60b341b7").
   This is enough to start participating in the cloud by serving ".."60b341b7",
   but we'll skip the server part in this example.

3) I receive answers to some of the above queries, containing the status of
   these peers wrt to the given repo and ref, ie the same data I sent above.
   Plus a list of random other live peers known to be tracking this ref, which
   I'll use to repeat step #2 and #3 until I have a list of enough peers to
   continue.

4) Now i know of 47 peers that already have the tag or commit "v2.6.37" (either
   I already knew that I wanted this one, or determined it during #3 and/or #1;
   ref handling omitted from this example for brevity).

   So i connect to one of the peers, and basically ask for the equivalent of
   "git fetch peer01 v2.6.37". 
   But that would pull all new objects from that one peer, and that isn't what
   i want. So i need to make it not only send me a thin pack, but also to omit
   some of the objects. As at this point i don't actually know anything about
   the objects in between "v2.6.33" and "v2.6.37" I can not split the request
   into smaller ones.

   So I'll cheat -- I'll take the number of available peers ("47") and the
   number of this peer ("0"), send these two integers over and ask the other
   side to skip transferring me any object whose 
   (HASH%available_peers)!=this_peer .

5) for (int this_peer=1; this_peer<available_peers; this_peer++)
     Repeat#4(this_peer);
   /* in parallel until i saturate the link */

6) Now i have 47 different packs, which probably do not make any sense
   individually, because they contain deltas vs nonexisting objects, but
   as a whole can be used to reconstruct the full tree.
   6a) Except of course if there are circular dependencies, which can
       occur eg. if peer#1 decided to encode object A as delta(B) and 
       peer#2 did B=delta(A), but this will be rare, and I'll just need
       to refetch either A or B to break the cycle, this time with real
       I-HAVES, hence this is guaranteed to succeed.

What am I missing?

artur
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]