Re: metadata compression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am creating repository data based on ALL rpms available to a specific Red Hat channel (6000 or so per channel)

rhel-i386-as-3
rhel-i386-es-3
rhel-i386-ws-3
rhel-i386-as-4
rhel-i386-es-4
rhel-i386-ws-4
rhel-i386-client-5
rhel-i386-server-5
rhel-x86_64-as-3
rhel-x86_64-es-3
rhel-x86_64-ws-3
rhel-x86_64-as-4
rhel-x86_64-es-4
rhel-x86_64-ws-4
rhel-x86_64-client-5
rhel-x86_64-server-5

With rhel-i386-as-4, other.xml is nearly 300 MB uncompressed, with gzip it is 66 MB, with lzma on max compression is 2.4 MB.

I'm personally not even concerned with storing the data in sqlite, I'm trying to limit network bandwidth. If yum had the capability to read in lzma compressed metadata it would accomplish this. Is the compression type of the metadata directly tied to the compression of the sqlite DB?

I will state I have been using 7z for the compression and not lzma from the SDK, 7z has much better results.

If it isn't doable or make much sense I have alternate ways to accomplish this outside of yum land.

On Sun, Apr 19, 2009 at 1:51 PM, James Antill <james-yum@xxxxxxx> wrote:
Joshua Bahnsen <archrival@xxxxxxxxx> writes:

> I am keeping track of 16 RHEL channels, using createrepo with the standard
> gzip I am totaling 1.4 GB of metadata.

 How many arches is that for? 


> Compressing those same XML documents
> with LZMA yields a total of 140 MB. That's 10x savings overall, I think
> that's worth a look.

 Well, again, it'd depend on what it did _for the .sqlite_ files. As
shipping the .xml files to the client machines is suboptimal in many
ways.

 Doing some quick tests:

 CentOS-5
 ---------
 primary.xml          = 5.3M
 primary.xml.gz       = 888K
 primary.xml.bz2      = 584K
 primary.xml.lz       = 540K

...so I'm not sure how you get 10x. Although for the .sqlite data it
seems to do a little better:

 Fedora-rawhide
 --------------
 primary.sqlite       = 37M
 primary.sqlite.gz    = 12M
 primary.sqlite.bz2   = 8.5M
 primary.sqlite.lz    = 6.8M

 filelists.sqlite     = 66M
 filelists.sqlite.bz  = 15M
 filelists.sqlite.bz2 = 13M
 filelists.sqlite.lz  = 11M

 other.sqlite         = 19M
 other.sqlite.gz      = 6.5M
 other.sqlite.bz2     = 4.6M
 other.sqlite.lz      = 2.8M

...which implies somewhere in the 25-35% savings range, but I doubt
that's enough (on it's own) given the CPU/code requirements.

--
James Antill -- james@xxxxxxx
_______________________________________________
Yum mailing list
Yum@xxxxxxxxxxxxxxxxx
http://lists.baseurl.org/mailman/listinfo/yum

_______________________________________________
Yum mailing list
Yum@xxxxxxxxxxxxxxxxx
http://lists.baseurl.org/mailman/listinfo/yum

[Index of Archives]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]

  Powered by Linux