Re: [PATCH] Add --no-reuse-delta option to git-gc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 11 Jun 2007, Johannes Schindelin wrote:

> Hi,
> 
> On Sun, 10 Jun 2007, Nicolas Pitre wrote:
> 
> > On Sun, 10 Jun 2007, Sam Vilain wrote:
> > 
> > > Anyway it's a free world so be my guest to implement it, I guess if 
> > > this was selectable it would only be a minor annoyance waiting a bit 
> > > longer pulling from from some repositories, and it would be 
> > > interesting to see if it did make a big difference with pack file 
> > > sizes.
> > 
> > It won't happen for a simple reason: to be backward compatible with 
> > older GIT clients.  If you have your repo compressed with bzip2 and an 
> > old client pulls it then the server would have to decompress and 
> > recompress everything with gzip.  If instead your repo remains with gzip 
> > and a new client asks for bzip2 then you have to recompress as well 
> > (slow).  So in practice it is best to remain with a single compression 
> > method.
> 
> With the extension mechanism we have in place, the client can send what 
> kind of compression it supports, and the server can actually refuse to 
> send anything if it does not want to recompress.
> 
> What I am trying to say: you do not necessarily have to allow every client 
> to access that particular repository. I agree that mixed-compression repos 
> are evil, but nothing stands in the way of a flag allowing (or 
> disallowing) recompression in a different format when fetching.

I know.

But is it worthwhile?  I think not.

However I won't stand in the way of anyone who wants to try and provide 
numbers.  I just don't believe this is worthwhile and am not inclined to 
do it.

OK... Well, I just performed a really quick test:

$ mkdir test-bzip2
$ mkdir test-gzip
$ cp git/*.[cho] test-bzip2
$ cp git/*.[cho] test-gzip
$ bzip2 test-bzip2/*
$ gzip test-gzip/*
$ du -s test-bzip2 test-gzip
5016    test-bzip2
4956    test-gzip

It is true that bzip2 is better with large files, but we typically have 
very few of them in a Git repo, and in the presence of large files bzip2 
then becomes _much_ slower than gzip. So, given that the nature of Git 
objects are likely to be small in 98% of the cases due to deltas, it 
appears that bzip2 won't be a gain at all but rather a waste, making a 
poor case for supporting it forever afterwards.

> So if you should decide someday to track data with Git (remember: Generic 
> Information Tracker, not just source code),

Bah... if you please.

> that is particularly unfit for 
> compression with gzip, but that you _need_ to store in a different 
> compressed manner, you can set up a repository which will _only_ _ever_ 
> use that compression.

Maybe.  But you'd better have a concrete data set and result numbers to 
convince me.  Designing software for hypothetical situations before they 
actually exist leads to bloatware.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux