On 05/27/2013 11:48 AM, Zdenek Pavlas wrote:
And there package diffs, which are ed-style diffs of the
Packages file I mentioned above. This approach would work quite well
for primary.xml because it doesn't contain cross-references between
packages using non-natural keys. It doesn't work for the SQLite
database, either in binary or SQL dump format, because of the reliance
on artificial primary keys (such as package IDs).
I've once tried this. With about 10k packages in fedora-updates, the delta
over 2-3 days was +491 -479. Assuming deletions are cheap, the delta should
ideally be 5%. As expected, binary bsddiff yields much bigger (~29%) delta.
A line-wise diff is much smaller because dependencies and package
descriptions mostly stay the same. (This assumes consistent sorting of
the primary.xml file.)
Can you point me to the primary.xml -> SQLite translation in yum? I've
got a fairly efficient primary.xml parser. It might be interesting to
see if it's possible to reduce the latency introduced by the SQLite
conversion to close to zero. (Decompression and INSERTs can be
interleaved with downloading, and maybe the index creation improvements
in SQLite are sufficient these days.)
However, for many users that follow unstable or testing, package diffs
are currently slower than downloading the full Packages file because the
diffs are incremental (i.e., they contain the changes from file version
N to N+1, and you have to apply all of them to get to the current
version) and apt-get can easily write 100 MB or more because the
Packages file is rewritten locally multiple times.
Yes, patch chaining should be avoided. I'd like to use N => 1 deltas,
that could be applied to many recent snapshots.
The Debian package diffs could be combined efficiently in the client
because it's possible to combine diffs for two adjacent versions without
actually knowing what the old or new versions look like. But this
hasn't been implemented in APT because ABI impact (which is a bit
puzzling, but anyway). Instead, the diffs should soon be combined on
the archive side.
--
Florian Weimer / Red Hat Product Security Team
--
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel