Re: git-fetching from a big repository is slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Dec 14, 2006, at 10:06, Andreas Ericsson wrote:

It wouldn't work for this particular case though. In our distribution repository we have ~300 bzip2 compressed tarballs with an average size of 3MiB. 240 of those are between 2.5 and 4 MiB, so they don't drastically differ, but neither do they delta well.

One option would be to add some sort of config option to skip attempting deltas of files with a certain suffix. That way we could just tell it to ignore *.gz,*.tgz,*.bz2 and everything would work just as it does today, but a lot faster.

Such special magic based on filenames is always a bad idea. Tomorrow somebody
comes with .zip files (oh, and of course .ZIP), then it's .jpg's other
compressed content. In the end git will be doing lots of magic and still perform
badly on unknown compressed content.

There is a very simple way of detecting compressed files: just look at the size of the compressed blob and compare against the size of the expanded blob. If the compressed blob has a non-trivial size which is close to the expanded
size, assume the file is not interesting as source or target for deltas.

Example:
   if (compressed_size > expanded_size / 4 * 3 + 1024) {
     /* don't try to deltify if blob doesn't compress well */
     return ...;
   }

  -Geert
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]