Re: Change to bzip2?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Steve G wrote:

Of course, when you're talking about man and info pages, what
are the chances that you actually save significant space when
you take the filesystem block size into account?



To me, its not just about diskspace. Its also about bandwidth. I don't know about how block size affects the following data, but here it is:

Uncompressed man page:
du -sh /usr/share/man/man?
26M     /usr/share/man/man1
2.9M    /usr/share/man/man2
52M     /usr/share/man/man3
2.5M    /usr/share/man/man4
3.7M    /usr/share/man/man5
16K     /usr/share/man/man6
1.7M    /usr/share/man/man7
5.7M    /usr/share/man/man8
8.0K    /usr/share/man/man9
1.3M    /usr/share/man/mann

Using gzip:
du -sh /usr/share/man/man?
17M     /usr/share/man/man1
2.5M    /usr/share/man/man2
40M     /usr/share/man/man3
640K    /usr/share/man/man4
2.2M    /usr/share/man/man5
16K     /usr/share/man/man6
1016K   /usr/share/man/man7
4.2M    /usr/share/man/man8
8.0K    /usr/share/man/man9
684K    /usr/share/man/mann
Total 82M

Using bzip2:
du -sh /usr/share/man/man? 16M /usr/share/man/man1
2.5M /usr/share/man/man2
40M /usr/share/man/man3
588K /usr/share/man/man4
2.1M /usr/share/man/man5
16K /usr/share/man/man6
976K /usr/share/man/man7
4.2M /usr/share/man/man8
8.0K /usr/share/man/man9
680K /usr/share/man/mann
Total 81M


One thing that skews the results is that some files were not compressed with
bzip2 because they were symlinked.



Calculating sizes may seem useful, but you're honking the wrong horn imho, if, for
nother reason, with both payload *AND* man page compression settable,
it really makes no difference counting man pages and summing sizes.


What is really needed is to change package transport, not diddle with package guts,
to use rsync like, rather than raw http transport.


For starters, all the mirroring of distros is rather simple minded atm.

So a new package is added.

rsync is fired up, and the remote site does not have that path.

What does rsync do? Copies the entire file.

Each additional rsync invoccation verifies that, indeed, the client and server
have identical content. Well, duh.


There is a fuzzy patch to rsync that matches on path, looking at suffix
like .rpm first, then choosing closest similar path as refence on remote.

That patch (with whatever sanity hardening necessary to map the functionality
to *only* rpm packages is needed, as the fuzzy patch is perhaps too risky
as is) needs to be wired into the rsync package.


Then -- since rsync is known to be sub-optimal with compressed payloads --
Rusty Russel's gzip.rsync.patch2 needs to be added to rpm. That patch
was in rpm-4.0.4, but alas, got blown out of rpm sources by the zlib
double free errata fire drill several years ago.

The patch is now (again) in rpm-4.4.1 and later.

There are quite promising hints of bandwidth savings (for apt, dunno rpm yet)

https://svn.uhulinux.hu/packages/dev/zlib/patches/02-rsync.patch

Explicit objective metrics of bandwidth savings for mirrors if both
package payload end-points include Rusty Russell's voo-doo will
only help stimulate development of better client transport protocols.

Or keep honking man pages in comnpressed with either bzip2 or gzip if
that floats your boat.

73 de Jeff




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]
  Powered by Linux