> I hope you're really not saying that, if I request to install package > foo, that depends on bar, it will also download headers for baz, a > totally unrelated package. I can see that we'd need headers for foo > and bar, but not for baz. I thought the point of the xml files and > the info on provides, filelists, etc, was precisely to enable the > depsolver to avoid having to download the headers for every package. Just so we don't go off into deeply uninformed space: yum 2.0.X downloaded all the headers in the headers directory that it did NOT have installed. It figured this out by reading header.info. This file stored nevra + rpm location. So yum 2.0.X downloaded this file to see what new headers it needed, downloaded them, then got on with the process at hand. > I'm wondering if it would be possible for a depsolver to create a > (smaller) .hdr file out of info in the .xml files, and feed that to > rpmlib for transaction-verification purposes. This would enable it to > skip the download-header step before downloading the entire package. Talk to Paul Nasrat - he was working on that a while ago but I think he got stuck in some rabbit hole debugging something. > Definitely. But couldn't we perhaps do it by intelligently filtering > information out of the rpm header and, say, generating a single > archive containing all of the info needed for depsolving and for > rpmlib's transaction verification? you can't do that b/c file conflicts CAN NOT be calculated via rpm w/o having the full header and/or all the file information present. > I was expecting depsolving wouldn't require all the headers. And from > what I gather from your reply, it indeed doesn't. it requires all the headers of the packages involved, yes. > Let's consider two scenarios: 1) using up2date with yum-2.0 (headers/) > repos (whoever claimed up2date supported rpmmd repodata/ misled me :-) > and 2) using yum-2.1 (repodata/) repos. > > 1) yum 2.0 > > 16MiB) initial download, distro's and empty updates's hdrs > > 8MiB) daily (on average) downloads of header.info for updates, > downloaded by rhn-applet, considering an average size of almost > 30KiB, for 40 weeks. (both FC2 and FC3 updates for i386 have a > header.info this big right now) > > 16MiB) .hdr files for updates, downloaded by the update installer. > Current FC2 i386 headers/ holds 9832KiB, whereas FC3 i386 > headers/ holds 8528KiB, but that doesn't count superseded > updates, whose .hdr files are removed. The assumption is that > each header is downloaded once. 16MiB is a guestimate, that I > believe to be inflated. It doesn't take into account the > duplicate downloads of header.info for updates, under the > assumption that a web proxy would avoid downloading again what > rhn-applet has already downloaded. > > ---- > > 40MiB) just in metadata over a period of 9 months, total > > 2) yum 2.1 > > 2.7MiB) initial download, distro's and empty updates' > primary.xml.gz and filelists.xml.gz > > 68MiB) daily (on average) downloads of primary.xml.gz, downloaded by > rhn-applet, considering an average size of 250KiB (FC2 updates's > is 240KiB, whereas FC3's is 257KiB, plus about 1KiB for > repomd.xml) > > 16MiB) .hdr files for updates, downloaded by the update installer > (same as in case 1) > > 192MiB) filelists.xml.gz for updates, downloaded twice a week on > average by the update installer, to solve filename dep. > > ---- > > 278.7MiB) just in metadata over a period of 9 months, total > > > Looks like a waste of at least 238.7 MiB per user per 9-month install. > Sure, it's not a lot, only 26.5MiB a month, but it's almost 6 times as > much data being transferred for the very same purpose. How is that a > win? Multiply that by the number of users pounding on your mirrors > and it adds up to hundreds of GiB a month. > Another factor is that you probably won't need filelists.xml.gz for > every update. Maybe I don't quite understand how often it is needed, > but even if I have to download it only once a month, that's still > 64MiB over 9 months, more than the 40MiB total metadata downloaded > over 9 months by yum 2.0. yum 2.1.x ONLY DOWNLOADS THE XML FILES WHEN IT NEEDS THEM. go read the code and stop guessing. it downloads repomd.xml everytime - that's < 1K. it downloads primary.xml.gz if the file has changed - that's typically < 1M. it downloads filelists.xml.gz only when there is a file dep that it cannot resolve with primary.xml.gz. > I don't know how yum 2.0 did it, but up2date surely won't even try to > download a .hdr file if it already has it in /var/spool/up2date, so > this is not an issue. yum 2.0.x certainly DID NOT download a .hdr file it already had. Sheesh, go read the code, stop making suppositions based on anecdotes. > repodata helps the initial download, granted, but it loses terribly in > the long run. only as the number of file deps outside of /etc/* and *bin/* increases. if you keep the file deps in those paths then repodata is a huge win. -sv