Re: [RFD/PATCH] Implement pack.compression and pack-objects --compression=N

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/2/07, Dana How <danahow@xxxxxxxxx> wrote:
On 5/2/07, Junio C Hamano <junkio@xxxxxxx> wrote:
> Dana How <danahow@xxxxxxxxx> writes:
> > Consequently,  for such a usage pattern it is useful
> > to specify different compression levels for loose
> > objects and packs.  This patch implements a config
> > variable pack.compression in addition to the existing
> > core.compression,  meant to be used for repacking.
> > It also adds --compression=N to pack-objects,
> > meant for push/pull/fetch,  if different,  or if different
> > on a per-repository basis.
> >
> > ** THIS PATCH IS UNTESTED AND MEANT FOR DISCUSSION. **
>
> I think we tweaked this area in the past, but I do not think
> the current setting was determined to be the best tradeoff for
> all workloads.  To be able to discuss the patch, I think it
> needs to come with benchmark numbers using publicly available
> repositories as guinea pigs and set of typical git operations,
> so people can reproduce and compare notes.

OK, but this patch doesn't mandate any particular setting.

Its motivation in my work environment is for pack.compression
to be what core.compression currently is,  and to set
core.compression to 0 to speed up large commits
(the resulting space-inefficient loose objects will be scrubbed away
 by a later off-line repack).
Thus,  my config settings (almost) change the gzip's behind a git-add to cp's.
Do you want me to submit timings for a git-add/git-commit -a
on a typical 50-file commit I would be interested in,
with the (new) settings that I would use?

Note the linux-2.6 git tree from a week ago has 22K checked-out files
of average size 11KB; the largest is fs/nls/nls-cp949.c at 874KB.
(The largest file in git is gitk at 176K.)

The tree I'm interested in maintaining with git is almost 70GB
checked-out in 13K files of average size >5.2MB.  This is over
2 orders of magnitude larger average file size than current git users.
(Some of these numbers may decrease after a little retraining ;-).)
I would like git to perform as responsively as possible on files
up to ~500MB.  Within this tree,  the largest file is 1234MB
[I think checking this in was a mistake!]
and I did the following experiments on it:

$ rm -rf .git
$ git-init
Initialized empty Git repository in .git/
$ git-config core.compression -1
$ wc large.spef
 12762072   37832482 1234082774 large.spef
$ /usr/bin/time git-add large.spef
41.54user 0.70system 0:49.76elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
$ ls -lR .git/objects/??
.git/objects/d5:
total 83836
-r--r--r--  1 how group 85670068 May  2 15:11
d6cde2af063cdfa835038385f29a897bf9533b

$ rm -rf .git
$ git-init
Initialized empty Git repository in .git/
$ git-config core.compression 1
$ wc large.spef
 12762072   37832482 1234082774 large.spef
$ /usr/bin/time git-add large.spef
23.66user 0.74system 0:34.07elapsed 71%CPU (0avgtext+0avgdata 0maxresident)k
$ ls -lR .git/objects/??
.git/objects/d5:
total 105116
-r--r--r--  1 how group 107419557 May  2 15:13
d6cde2af063cdfa835038385f29a897bf9533b

So for a 25% increase in blob size I get 33% less elapsed time
in git-add, all by changing core.compression from -1 to 1.
I'll definitely take that improvement.  [For the compressible files
we typically have, using 0 is a bad idea:  the CPU "advantage"
is swamped out by the time to write a much larger file.]

Since I don't care [to the same degree] about the responsiveness of
packing,  I'd rather pack with -1 or better to keep packs small.
(And inflation time seems independent of compression setting.)
Since someone might be working while the packing is happening,
I'd rather not change the config setting to achieve this.
Hence the patch.

Concerning various public repositories, clearly the patch has no
impact if you don't specify different core.compression and pack.compression
values.  If you do specify different values,  I doubt there would be much
noticeable speed-up for e.g. the linux-2.6 repo stats I included above.
There would be some,  but that wasn't the motivation for the patch.

Thanks,
--
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]