Re: Proposal: Faster composes by eliminating deltarpms and using zchunked rpms instead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2018-11-19 at 16:29 -0500, Simo Sorce wrote:
> On Mon, 2018-11-19 at 21:02 +0000, Jonathan Dieter wrote:
> > On Mon, 2018-11-19 at 15:18 -0500, Simo Sorce wrote:
<snip>
> > > I do not think you can just trust random metadata somewhere, one of the
> > > points of a rpm reinstall is to fix damaged files for example. It does
> > > no good if you skip those because some file somewhere says they are
> > > "OK". (If I understood your comment about "just downloading changed
> > > chunks).
> > 
> > Yes, this is the crux of the problem.  As I see it, dnf should verify
> > the checksums on the local files before downloading the missing chunks,
> > but that doesn't guarantee that the data won't be changed between the
> > download step and the install step.  RPM would also need to verify the
> > checksums before starting the install phase, and would need to bail out
> > if the checksums had changed.
> > 
> > My biggest concern, though, is what happens if package A needs a
> > specific chunk in /usr/bin/foo and package B changes /usr/bin/foo while
> > being installed.  The chunk was there when the install phase started,
> > but disappeared before package A was actually installed.
> 
> Is this different in a normal install ?
> What if package A installs /usr/bin/foo and then package B overwrites
> it ?
> 
> Or are you concerned about the case where there may be an identical
> chunk in different files ? Are chunks "global" to the host ?

This.  If we stored the checksums in the rpm database, then, yes they
would be global to the host.

> This problem could be addressed by copying all uncompressed chunks in a
> staging area before installing the rpm, failing in a clean way (ie not
> half way through a package install). The penalty is the need for enough
> space to copy the uncompressed files though. more clever things could
> be done with proper filesystem support and snapshotting and copy-on-
> write, but not sure it is worth optimizing for what is normally a
> relatively small scratch area (if you do it one package at a time
> only).

What about just copying any uncompressed chunks required for the
current package or any packages still in the install queue?  That might
reduce the scratch area even further.

> > > 2) what are the chunks sizes ?
> > 
> > The chunk sizes vary because you don't want inserting or removing a few
> > bytes to completely change all the following chunks.  The current
> > default average size is 32KB, but that can be adjusted.
> 
> Is this a compromise between compression performance and granularity ?
> Anything else went into the decision to settle around 32k ?
> Some filesystems seem to gravitate around 64k extents so I am
> wondering.

Yes, this is just a compromise.  The larger the chunk size, the larger
the delta you need to download, but the better the compression.  We
could experiment with this to see if 64k would give us significantly
better compression.

I would also chunk on file borders in the rpm payload, so we don't end
up having a chunk span multiple files.  That would get messy fast when
trying to rebuild from local files.

> > > Finally what signature scheme where you planning to use ? And how do
> > > you deal with the data you want to "exclude" from signing, omit it or
> > > feed in blank "sectors" ?
> > 
> > I was planning to use GPG signatures, and was planning to just omit the
> > data I want excluded.  Having said that, while the format supports
> > signatures, the code hasn't been written and if either of those answers
> > are bad/dangerous, we can change that.
> 
> We use GPG signatures right now, can't be any more dangerous than that
> :-)
> 
> The omission vs blanking has no ill effect, but was not explicitly
> mentioned, it should. Esp around places where the missing data is in
> the middle of a "structure" in your diagrams, or it may be ambiguous
> and lead to incompatible implementations if someone is ever going to
> build another (and if zchunk is going to be adopted in rpm I bet there
> will be some other implementation to do some crazy thing :-)

Yep.  Let me clarify that in the format definition (and add the new
checksum types, I noticed they're missing).

Jonathan
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux