On Wed, 2005-02-02 at 15:45 +0100, Florian La Roche wrote: > On Wed, Feb 02, 2005 at 06:03:24AM -0800, Steve G wrote: > > Hi, > > > > With the discussion about trimming specfile changelogs to save space and improve > > downloads...why not go one step further? Mandrake has been using bzip2 for a > > while and it works just as well and files are significantly smaller. The > > conversion could be done in several steps: > > > > 1) man pages - less already handles bzipped man pages > > 2) info pages - I submited patch in bz #128637 to try to get it working > > 3) tar > > 4) rpms - I'm sure the patch is in Mandrake's version > > > > Thoughts? > > bzip2 is only used for the cpio-packed file-data, the rpm-header is > not compressed. For the repo-data the changelog can also be trimmed, > only if you need to copy the rpm header unmodified this is actually > getting a problem (e.g. if you later-on want to verify the md5sum to > be the same as in full rpms you download or similar things). > > I think staying with gzip is ok as it really is a good middle ground > between speed and disk compression ratio. bzip2 "feels" noticable slower. In my opinion a conversion to bzip2 is a right thing to do. I'm also trying to keep almost everything compressed to bzip2 because of its significantly better compression scheme and performance. I'll illustrate this on the mc tarball: -rw-rw-r-- 1 jnovy jnovy 2831562 Jan 28 09:52 mc-4.6.1-pre3.tar.bz2 -rw-rw-r-- 1 jnovy jnovy 3956127 Feb 2 15:26 mc-4.6.1-pre3.tar.gz where we can see that the gzipped tarball is larger of more than 1/3 in comparison with the bzipped one. Decompression times are: gunzip decompression: real 0m0.257s user 0m0.198s sys 0m0.059s bunzip2 decompression: real 0m1.665s user 0m1.567s sys 0m0.098s so a conclusion could be that bunzip2 is about 6-7 times slower than gunzip. This is unfortunately a common myth among developers because bzip2 uses the best compression (-9, so 900k blocks for BWT) by default and gzip uses compromised performance (-6), but that means something different compared to bzip2 since gzip is LZ77 based. bzip2 is scalable enough to use even better compression times or performance. If you consider that for the fastest (and worst) -1 compression with bzip2 you'll get: -rw-rw-r-- 1 jnovy jnovy 3592894 Feb 2 16:08 mc-4.6.1-pre3.tar.bz2 what is even better than the best compression (-9) with gzip and decompression time is: real 0m1.076s user 0m1.003s sys 0m0.073s so about 4 times slower than gzip. The question is what is the priority at the moment, if a space consumed by the file or a decompression time. There are also some projects such as pbzip2 (http://compression.ca/pbzip2/) that uses a fact that bzip2 actually compresses parts of large files in separated blocks, so that the BWT and Huffman encoding phase can be performed separately on these blocks simultaneously in multiple threads what speeds compression/decompression times significantly up on smp machines. Further if you consider scalability of bzip2 which has a compression range: best (-9): 2831562, worst (-1): 3592894 and gzip: best (-9): 3931362, worst (-1): 4634277 I think bzip2 is the winner at least from the future point of view. Cheers, Jindrich -- Jindrich Novy <jnovy@xxxxxxxxxx>, http://people.redhat.com/jnovy/