Re: Continue git clone after interruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Pitre <nico@xxxxxxx> writes:

> On Tue, 18 Aug 2009, Tomasz Kontusz wrote:
> 
> > Ok, so it looks like it's not implementable without some kind of cache
> > server-side, so the server would know what the pack it was sending
> > looked like.
> > But here's my idea: make server send objects in different order (the
> > newest commit + whatever it points to first, then next one,then
> > another...). Then it would be possible to look at what we got, tell
> > server we have nothing, and want [the newest commit that was not
> > complete]. I know the reason why it is sorted the way it is, but I think
> > that the way data is stored after clone is clients problem, so the
> > client should reorganize packs the way it wants.
> 
> That won't buy you much.  You should realize that a pack is made of:
> 
> 1) Commit objects.  Yes they're all put together at the front of the pack,
>    but they roughly are the equivalent of:
> 
> 	git log --pretty=raw | gzip | wc -c
> 
>    For the Linux repo as of now that is around 32 MB.

For my clone of Git repository this gives 3.8 MB
 
> 2) Tree and blob objects.  Those are the bulk of the content for the top 
>    commit.  The top commit is usually not delta compressed because we 
>    want fast access to the top commit, and that is used as the base for 
>    further delta compression for older commits.  So the very first 
>    commit is whole at the front of the pack right after the commit 
>    objects.  you can estimate the size of this data with:
> 
> 	git archive --format=tar HEAD | gzip | wc -c
> 
>    On the same Linux repo this is currently 75 MB.

On the same Git repository this gives 2.5 MB

> 
> 3) Delta objects.  Those are making the rest of the pack, plus a couple 
>    tree/blob objects that were not found in the top commit and are 
>    different enough from any object in that top commit not to be 
>    represented as deltas.  Still, the majority of objects for all the 
>    remaining commits are delta objects.

You forgot that delta chains are bound by pack.depth limit, which
defaults to 50.  You would have then additional full objects.

The single packfile for this (just gc'ed) Git repository is 37 MB.
Much more than 3.8 MB + 2.5 MB = 6.3 MB.

[cut]

There is another way which we can go to implement resumable clone.
Let's git first try to clone whole repository (single pack; BTW what
happens if this pack is larger than file size limit for given
filesystem?).  If it fails, client ask first for first half of of
repository (half as in bisect, but it is server that has to calculate
it).  If it downloads, it will ask server for the rest of repository.
If it fails, it would reduce size in half again, and ask about 1/4 of
repository in packfile first.

The only extension required is for server to support additional
capability, which enable for client to ask for appropriate 1/2^n part
of repository (approximately), or 1/2^n between have and want.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]