Re: Sparse clones (Was: Re: [PATCH 1/2] upload-pack: support subtree packing)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 28, 2010 at 1:06 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> 2010/7/28 Avery Pennarun <apenwarr@xxxxxxxxx>:
>> 2010/7/27 Elijah Newren <newren@xxxxxxxxx>:
>>> 0) Sparse clones have "all" commit objects, but not all trees/blobs.
>>>
>>> Note that "all" only means all that are reachable from the refs being
>>> downloaded, of course.  I think this is widely agreed upon and has
>>> been suggested many times on this list.
>>
>> I think downloading all commit objects would require very low
>> bandwidth and storage space, so it should be harmless.
> >
> > In fact, I have a pretty strong impression that also downloading
> > all *tree* objects would be fine too.
>
> Here you go. A pack with only commits and trees of git.git#master is
> 15M. With blobs, it is 28M. Git is not a typical repo with large blobs
> though.

Hmm, that's very interesting - more than half the repo is just tree
and commit objects?  Maybe that's not so shocking after all, given the
tendency in the git project to use long commit messages and relatively
short patches.

Was your pack carefully ordered for best deltification?

Knowing how much of that is commits vs. trees would also be very interesting.

But if so, only saving half the space is kind of disappointing.  If
you have a script around for generating this, it would be very
interesting to compare the results with, say, the Linux kernel repo
(especially since it seems to be the #1 example of "submodules people
don't want to check out because they're so bloody huge").

In bup, I know the trees+commits are much smaller than the blobs, so
my intuition was telling me it would be the same in git.  It's
entirely possible that I was wrong, though.  In retrospect, bup uses
really short computer-generated commit messages, and backs up large
numbers of files at once, most of which never change (and thus most of
the trees never change).  Commits+trees end up somewhere around 0.5%
of the total repo size.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]