Re: Resumable clone/Gittorrent (again)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 5 Jan 2011, Nguyen Thai Ngoc Duy wrote:

> Hi,
> 
> I've been analyzing bittorrent protocol and come up with this. The
> last idea about a similar thing [1], gittorrent, was given by Nicolas.
> This keeps close to that idea (i.e the transfer protocol must be around git
> objects, not file chunks) with a bit difference.
> 
> The idea is to transfer a chain of objects (trees or blobs), including
> base object and delta chain. Objects are chained in according to
> worktree layout, e.g. all objects of path/to/any/blob will form a
> chain, from a commit tip down to the root commits. Chains can have
> gaps, and don't need to start from commit tip. The transfer is
> resumable because if a delta chain is corrupt at some point, we can
> just request another chain from where it stops. Base object is
> obviously resumable.

How do you actually define your chain?  Given that Git is conceptually 
snapshot based, there is currently no relationship between two blobs 
forming the content for two different versions of the same file.  Even 
delta objects are not really part of the Git data model as they are only 
an encoding variation of a given primary object.  In fact, we may and 
actually do have deltas where the base object is not from the same 
worktree file as the delta object itself.

The only thing that 
ties this all together is the commit graph.  And that graph might have 
multiple forks and merges so any attempt at a linearity representation 
into a chain is rather futile.  Therefore it is not clear to me how you 
can define a chain with a beginning and an end, and how this can be 
resumed midway.

> We start by fetching all commit contents reachable from a commit tip.

Sure.  This is doable today and is called a shalow clone with depth=1.

> This is a chain, therefore resumable.

I don't get that part though.  How is this resumable?  That's the very 
issue we have with a clone.

I proposed a solution to that already, which is to use 
git-upload-archive for one of the tip commit since the data stream 
produced by upload-archive (once decompressed) is actually 
deterministic.  Once completed, this can be converted into a shalow 
clone on the client side, and can be deepened in smaller steps 
afterwards.

> From there each commit can be
> examined. Missing trees and blobs will be fetched as chains. Everytime
> a delta is received, we can recreate the new object and verify it (we
> should have its SHA-1 from its parent trees/commits).

What if the delta is based on an object from another chain?  How do you 
determine which chain to ask for to get that base?

> Because these chains are quite independent, in a sense that a blob
> chain is independent from another blob chain (but requires tree
> chains, of course). We can fetch as many as we want in parallel, once
> we're done with the commit chain.

But in practice, most of those chains will end up containing objects 
which are duplicate of objects in another chain.  How do you tell the 
remote that you want part of a chain because you've got 96% of it in 
another chain already?

> The last thing I like about these chains is that the number of chains
> is reasonable. It won't increase too fast over time (as compared to
> the number of commits). As such it maps well to BitTorrent's "pieces".

My problem right now is that I don't see how this maps well to Git.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]