On Fri, Nov 16, 2018 at 6:03 PM Jonathan Dieter <jdieter@xxxxxxxxx> wrote: > > > *Changes* > The zchunk format would need to be extended to allow for a zchunked rpm > to contain both the uncompressed chunks that were already on the local > system and the newly downloaded compressed chunks while still passing > signature verification. This would also require moving signature > verification to zchunk. > > The rpm file format has to be changed because the zchunk header needs > to be at the beginning of the file in order for the zchunk library > figure out which chunks it needs to download. My suggestions for > changes to the rpm file format are as follows: > > 1. Signing should be moved to the zchunk format as described at the > beginning of this section > 2. The rpm header should be stored in one stream inside the zchunk > file. This allows it to be easily extracted separately from the > data > 3. The rpm cpio should be stored in a second stream inside the zchunk > file. > 4. At minimum, an optional zchunk element should be set to identify > zchunk rpms as rpms rather than regular zchunk files. If desired, > optional elements could also be set containing %{name}, %[version}, > %{release}, %{arch} and %{epoch}. This would allow this information > to be read easily without needing to extract the rpm header stream. > > *Final notes* > I realize this is a massive proposal, zchunk is still very young, and > we're still working on getting the dnf zchunk pull requests reviewed. > I do think it's feasible and provides an opportunity to eliminate a > pain point from our compose process while still reducing the download > size for our users. > If we're really considering changing the RPM file format, then we need a proper discussion on rpm-maint@ and rpm-ecosystem@ mailing lists on rpm.org. Can you please start a targeted discussion there? But addressing the specific concrete suggestion here, there's a few concerns I have: 1. This is a huge format break, which means that for the first time in a _very_ long time, it would not be possible to reuse RHEL for Fedora infrastructure _at all_. That's going to be a difficult problem. There's a large legacy of systems that won't be able to handle that new format, and unfortunately, rpm is not parallel installable in the same manner as something like GCC or Python currently. Making it parallel installable *is* possible (I've done it, and there have been other attempts before), but it's not a supported thing. This is probably the thing that would trigger a major version bump for RPM, since it's a new archive format. 2. This also means the _entire_ ecosystem of RPM archive parsers will break. This is not particularly insurmountable, actually, as the RPM file format was not particularly well documented, and a new format is an opportunity to revisit some of those old issues and try to do a better job this go around. But it's still a challenge to deal with. 3. When you refer to the rpm cpio, I assume you're referring to only the archive payload, right? Typically the payload is what is compressed, and the headers are not. It sounds like you're proposing both aspects to be compressed, and compressed differently. If we made the RPM header an uncompressed zchunk stream and the RPM payload a zstd-compressed zchunk stream, would we be able to support fetching header deltas for retrieving extra information on the fly? Say, for example, attributes like arch color, filecap properties, and so on, that aren't in the rpm-md data for things like transaction tests without the whole RPM? 4. I'd actually rather make it easier for the header streams to be fetched instead of trying to make specific attributes easier in the header payload. History has shown that any attempt at foresight here tends to fail miserably, and common attributes are already specified in the rpm-md primary.xml anyway, so if you're fetching the header to retrieve an attribute, you *need* to do something weird anyway. 5. I'm not exactly sure what you mean by zchunk signing... 6. I'm wondering why we can't do a perfect reconstruction of the original RPM, given two RPM sources that are both zchunked? We can pull it off with repodata, so what's different about RPM that makes that not doable? -- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx