On Mon, 2018-11-19 at 21:02 +0000, Jonathan Dieter wrote: > On Mon, 2018-11-19 at 15:18 -0500, Simo Sorce wrote: > > On Mon, 2018-11-19 at 19:58 +0000, Jonathan Dieter wrote: > > <snip> > > > That's an interesting thought. I was picturing using the zchunk > > > library in the dnf download stage to build a local rpm from the > > > verified locally installed files and the downloaded changed chunks, > > > but, if I understand your suggestions correctly, you're saying we > > > could > > > just download the changed chunks and have RPM automatically get the > > > rpm-integrity verified chunks during the *install* stage. > > > > How do you know which chunks to download w/o having a stored (or > > recomputed) list of existing chunks ? > > I thought we should store the chunk checksums of installed files in the > rpm database. Something like file, offset, length, checksum type, > checksum? > > > > The advantage of this method is that you don't need to store the local > > > data twice, but the danger is that the local files get changed > > > elsewhere during the install process. > > > > > > It's an interesting thought, though, and I wonder if there's a way we > > > could work around that danger? > > > > I do not think you can just trust random metadata somewhere, one of the > > points of a rpm reinstall is to fix damaged files for example. It does > > no good if you skip those because some file somewhere says they are > > "OK". (If I understood your comment about "just downloading changed > > chunks). > > Yes, this is the crux of the problem. As I see it, dnf should verify > the checksums on the local files before downloading the missing chunks, > but that doesn't guarantee that the data won't be changed between the > download step and the install step. RPM would also need to verify the > checksums before starting the install phase, and would need to bail out > if the checksums had changed. > > My biggest concern, though, is what happens if package A needs a > specific chunk in /usr/bin/foo and package B changes /usr/bin/foo while > being installed. The chunk was there when the install phase started, > but disappeared before package A was actually installed. Is this different in a normal install ? What if package A installs /usr/bin/foo and then package B overwrites it ? Or are you concerned about the case where there may be an identical chunk in different files ? Are chunks "global" to the host ? This problem could be addressed by copying all uncompressed chunks in a staging area before installing the rpm, failing in a clean way (ie not half way through a package install). The penalty is the need for enough space to copy the uncompressed files though. more clever things could be done with proper filesystem support and snapshotting and copy-on- write, but not sure it is worth optimizing for what is normally a relatively small scratch area (if you do it one package at a time only). > > A couple more questions. > > I skimmed quickly at the format and I have two questions I did not > > immediately see an answer for. > > 1) why are you still supporting SHA-1 in a new format ? > > Zchunk cares about two types of checksums, the chunk checksums, used to > determine if two chunks are the same, and the full data checksum (which > currently defaults to SHA-25), used to actually validate the data. > > Originally, SHA-1 was supposed to be used *only* for the chunk > checksums, but, somewhere along the way, it was pointed out that using > the first 128 bits of a SHA-512 hash would be faster and more secure, > so the default for the chunk checksums is now SHA-512/128. > > The only reason SHA-1 support is still in zchunk is because I don't > want to break backwards compatibility for the (probably five) zchunk > files created before this change. > > Having said that, zchunked rpms won't be able to depend on the full > data checksum (because the local chunks will be uncompressed), so we'd > need to use SHA-256 at minimum for the chunk checksums. > > > 2) what are the chunks sizes ? > > The chunk sizes vary because you don't want inserting or removing a few > bytes to completely change all the following chunks. The current > default average size is 32KB, but that can be adjusted. Is this a compromise between compression performance and granularity ? Anything else went into the decision to settle around 32k ? Some filesystems seem to gravitate around 64k extents so I am wondering. > > Sorry if this is already answered somewhere. > > > > Finally what signature scheme where you planning to use ? And how do > > you deal with the data you want to "exclude" from signing, omit it or > > feed in blank "sectors" ? > > I was planning to use GPG signatures, and was planning to just omit the > data I want excluded. Having said that, while the format supports > signatures, the code hasn't been written and if either of those answers > are bad/dangerous, we can change that. We use GPG signatures right now, can't be any more dangerous than that :-) The omission vs blanking has no ill effect, but was not explicitly mentioned, it should. Esp around places where the missing data is in the middle of a "structure" in your diagrams, or it may be ambiguous and lead to incompatible implementations if someone is ever going to build another (and if zchunk is going to be adopted in rpm I bet there will be some other implementation to do some crazy thing :-) > > Thanks for any answer. > > Simo. > > Thank you for looking at this! > > Jonathan > _______________________________________________ > devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx > To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx -- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx