> And there package diffs, which are ed-style diffs of the > Packages file I mentioned above. This approach would work quite well > for primary.xml because it doesn't contain cross-references between > packages using non-natural keys. It doesn't work for the SQLite > database, either in binary or SQL dump format, because of the reliance > on artificial primary keys (such as package IDs). I've once tried this. With about 10k packages in fedora-updates, the delta over 2-3 days was +491 -479. Assuming deletions are cheap, the delta should ideally be 5%. As expected, binary bsddiff yields much bigger (~29%) delta. Very roughly, it's 5% that really describe new packages, plus an almost constant 24% overhead to fix up the inevitable changes in surrogate keys. Not as bad as I was afraid, but still not worth it (IMO). So, we need *.xml deltas. Yum can rebuild xml => .sqlite locally, but this needs quite a lot of memory and takes TENS of seconds. Add the time needed to patch the quite large uncompressed xml file, and suddenly the fact that you're downloading just 1/10th of data hardly pays off (ignoring very specific use cases, like mobile data for a moment) For DNF, it's different. It has to rebuild xml => .solv anyway, so this comes for free. > However, for many users that follow unstable or testing, package diffs > are currently slower than downloading the full Packages file because the > diffs are incremental (i.e., they contain the changes from file version > N to N+1, and you have to apply all of them to get to the current > version) and apt-get can easily write 100 MB or more because the > Packages file is rewritten locally multiple times. Yes, patch chaining should be avoided. I'd like to use N => 1 deltas, that could be applied to many recent snapshots. -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel