Re: [Rpm-metadata] Re: Better repodata performance

Alexandre Oliva <oliva@xxxxxxxxxxxxxxxxx> · 31 Jan 2005 02:27:57 -0200

On Jan 31, 2005, Jeff Pitman <symbiont@xxxxxxxxxx> wrote:

> This could be driven by an optional parameter to createrepo, which 
> provides a list of packages to create a delta with.

Err...  Why?  We already have repodata/, and we're creating the new
version in .repodata.  We can use repodata/ however we like, I think.

> If it were fully automatic, it would only be a download win for the
> user.

And the servers.

> I would rather not utilize xdelta, because you're still regenerating the 
> entire thing.  Having xmlets that virtually add/substract as a delta 
> against primary.xml.gz would be optimal for both sides of the equation. 

But then Seth rejects the idea because it makes for unmaintainable
code.  And I sort of agree with him now that I see a simpler way to
accomplish the same bandwidth savings.

> Another advantage of the delta method, is that the on-disk pickled 
> objects (or whatever back-end store is used) could be updated 
> incrementally based on xml snippets coming in. Instead of regenerating 
> the whole thing over again.

This is certainly a good point, but it is also trickier to get right.
And it might also turn out to be bigger: if you have to list what went
away, you're probably emitting more information than xdelta's `skip
these many bytes'.  It's like comparing diff with xdelta: diff is
reversible because it contains what was removed and what as added
(plus optional context), whereas xdelta only contains what was
inserted and what portions of the original remained.

Getting inserts small is trivial; getting removals small might be
trickier, and to take advantage of pickling we need the latter.

Unless...  Does anyone feel like implementing an xml-aware xdelta-like
program in Python? :-)

-- 
Alexandre Oliva             http://www.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}