Re: Proposal: Faster composes by eliminating deltarpms and using zchunked rpms instead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2018-11-19 at 21:02 +0000, Jonathan Dieter wrote:
> On Mon, 2018-11-19 at 15:18 -0500, Simo Sorce wrote:
> > On Mon, 2018-11-19 at 19:58 +0000, Jonathan Dieter wrote:
> 
> <snip>
> > > That's an interesting thought.  I was picturing using the zchunk
> > > library in the dnf download stage to build a local rpm from the
> > > verified locally installed files and the downloaded changed chunks,
> > > but, if I understand your suggestions correctly, you're saying we
> > > could
> > > just download the changed chunks and have RPM automatically get the
> > > rpm-integrity verified chunks during the *install* stage.
> > 
> > How do you know which chunks to download w/o having a stored (or
> > recomputed) list of existing chunks ?
> 
> I thought we should store the chunk checksums of installed files in the
> rpm database.  Something like file, offset, length, checksum type,
> checksum?
> 
> > > The advantage of this method is that you don't need to store the local
> > > data twice, but the danger is that the local files get changed
> > > elsewhere during the install process.
> > > 
> > > It's an interesting thought, though, and I wonder if there's a way we
> > > could work around that danger?
> > 
> > I do not think you can just trust random metadata somewhere, one of the
> > points of a rpm reinstall is to fix damaged files for example. It does
> > no good if you skip those because some file somewhere says they are
> > "OK". (If I understood your comment about "just downloading changed
> > chunks).
> 
> Yes, this is the crux of the problem.  As I see it, dnf should verify
> the checksums on the local files before downloading the missing chunks,
> but that doesn't guarantee that the data won't be changed between the
> download step and the install step.  RPM would also need to verify the
> checksums before starting the install phase, and would need to bail out
> if the checksums had changed.
> 
> My biggest concern, though, is what happens if package A needs a
> specific chunk in /usr/bin/foo and package B changes /usr/bin/foo while
> being installed.  The chunk was there when the install phase started,
> but disappeared before package A was actually installed.

Is this different in a normal install ?
What if package A installs /usr/bin/foo and then package B overwrites
it ?

Or are you concerned about the case where there may be an identical
chunk in different files ? Are chunks "global" to the host ?

This problem could be addressed by copying all uncompressed chunks in a
staging area before installing the rpm, failing in a clean way (ie not
half way through a package install). The penalty is the need for enough
space to copy the uncompressed files though. more clever things could
be done with proper filesystem support and snapshotting and copy-on-
write, but not sure it is worth optimizing for what is normally a
relatively small scratch area (if you do it one package at a time
only).

> > A couple more questions.
> > I skimmed quickly at the format and I have two questions I did not
> > immediately see an answer for.
> > 1) why are you still supporting SHA-1 in a new format ?
> 
> Zchunk cares about two types of checksums, the chunk checksums, used to
> determine if two chunks are the same, and the full data checksum (which
> currently defaults to SHA-25), used to actually validate the data.
> 
> Originally, SHA-1 was supposed to be used *only* for the chunk
> checksums, but, somewhere along the way, it was pointed out that using
> the first 128 bits of a SHA-512 hash would be faster and more secure,
> so the default for the chunk checksums is now SHA-512/128.
> 
> The only reason SHA-1 support is still in zchunk is because I don't
> want to break backwards compatibility for the (probably five) zchunk
> files created before this change.
> 
> Having said that, zchunked rpms won't be able to depend on the full
> data checksum (because the local chunks will be uncompressed), so we'd
> need to use SHA-256 at minimum for the chunk checksums.
> 
> > 2) what are the chunks sizes ?
> 
> The chunk sizes vary because you don't want inserting or removing a few
> bytes to completely change all the following chunks.  The current
> default average size is 32KB, but that can be adjusted.

Is this a compromise between compression performance and granularity ?
Anything else went into the decision to settle around 32k ?
Some filesystems seem to gravitate around 64k extents so I am
wondering.

> > Sorry if this is already answered somewhere.
> > 
> > Finally what signature scheme where you planning to use ? And how do
> > you deal with the data you want to "exclude" from signing, omit it or
> > feed in blank "sectors" ?
> 
> I was planning to use GPG signatures, and was planning to just omit the
> data I want excluded.  Having said that, while the format supports
> signatures, the code hasn't been written and if either of those answers
> are bad/dangerous, we can change that.

We use GPG signatures right now, can't be any more dangerous than that
:-)

The omission vs blanking has no ill effect, but was not explicitly
mentioned, it should. Esp around places where the missing data is in
the middle of a "structure" in your diagrams, or it may be ambiguous
and lead to incompatible implementations if someone is ever going to
build another (and if zchunk is going to be adopted in rpm I bet there
will be some other implementation to do some crazy thing :-)

> > Thanks for any answer.
> > Simo.
> 
> Thank you for looking at this!
> 
> Jonathan
> _______________________________________________
> devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux