On Jan 29, 2005, seth vidal <skvidal@xxxxxxxxxxxx> wrote: > How would it reduce bandwidth - you'd have to download and parse > multiple entries and you'd STILL have to do just a much work on the > repo-side b/c you'd have to check all the packages for changes. The reduced bandwidth would be for the thousands of users who could download a 1KiB file with the changes since the last time they checked the repo, instead downloading 4MiB with about 1KiB of new information. Sure, createrepo would have to look at previous versions of the repodata, see what changed since then (it could optionally use only file timestamps and sizes to check that files haven't changed, instead of having to read them entirely to compute checksums) and generate a new, incremental repository format. What I'm thinking is that this incremental repodata tree would contain the relative location of the original repodata tree, such that whoever downloads the incremental repodata can get to the previous states, and so on, by following the paths given. So we could put in a counter-based repository history with the following properties: - after the first run of createrepo, repodata/repomd.xml points to repodata/0, without adding or removing anything. - after the second run of createrepo, repodata/repomd.xml points to repodata/1, with a repomd.xml that points to ../0, and primary.xml et al files adding/removing packages from ../0 and so on. every now and then, one could consolidate the multiple repodata subdirs into a single set of xml files. You could even do this every time, and have repomd.xml indicate that you can either get all the data from this single set of files, or the incremental history from this other file. This sort of indirection in repomd.xml has one interesting additional side effects: if done properly, it would enable us to create composite and/or filtered repositories. Your composite repository would reference a base repository (or a set thereof) in repomd.xml, as well as package removals or additions so as to filter out packages from one repository that are say known to be incompatible, and additions from your own. This may sure add a lot of complexity to the client side, but reducing daily downloads of rawhide/i386's primary.xml.gz and filelists.xml.gz (totaling 4MiB) by however many users track rawhide to a few KiB sounds like a pretty good idea to me. -- Alexandre Oliva http://www.ic.unicamp.br/~oliva/ Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org} Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}