Compression speed for large files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm looking at doing version control of data files, potentially very large,
often binary. In git, committing of large files is very slow; I have tested with
a 45MB file, which takes about 1 minute to check in (on an intel core-duo 2GHz).

Now, most of the time is spent in compressing the file. Would it be a good idea
to change the Z_BEST_COMPRESSION flag to zlib, at least for large files? I have
measured the time spent by git-commit with different flags in sha1_file.c:

  method                 time (s)  object size (kB)
  Z_BEST_COMPRESSION     62.0      17136
  Z_DEFAULT_COMPRESSION  10.4      16536
  Z_BEST_SPEED            4.8      17071

In this case Z_BEST_COMPRESSION also compresses worse, but that's not the major
issue: the time is. Here's a couple of other data points, measured with gzip -9,
-6 and -1 (comparable to the Z_ flags above):

129MB ascii data file
  method    time (s)  object size (kB)
  gzip -9   158       23066
  gzip -6    18       23619
  gzip -1     6       32304

3MB ascii data file
  gzip -9   2.2        887
  gzip -6   0.7        912
  gzip -1   0.3       1134

So: is it a good idea to change to faster compression, at least for larger
files? From my (limited) testing I would suggest using Z_BEST_COMPRESSION only
for small files (perhaps <1MB?) and Z_DEFAULT_COMPRESSION/Z_BEST_SPEED for
larger ones.


-j.

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]