Neal, thanks so much for your thoughts on this. Responses inline: On Sat, 2018-11-17 at 09:53 -0500, Neal Gompa wrote: <snip> > If we're really considering changing the RPM file format, then we need > a proper discussion on rpm-maint@ and rpm-ecosystem@ mailing lists on > rpm.org. Can you please start a targeted discussion there? Sure. > But addressing the specific concrete suggestion here, there's a few > concerns I have: > > 1. This is a huge format break, which means that for the first time in > a _very_ long time, it would not be possible to reuse RHEL for Fedora > infrastructure _at all_. That's going to be a difficult problem. > There's a large legacy of systems that won't be able to handle that > new format, and unfortunately, rpm is not parallel installable in the > same manner as something like GCC or Python currently. Making it > parallel installable *is* possible (I've done it, and there have been > other attempts before), but it's not a supported thing. This is > probably the thing that would trigger a major version bump for RPM, > since it's a new archive format. Agreed, that this would be a massive format change and should therefore be a major version bump for RPM. New versions of RPM should still be able to read and install old-format rpms, but, as you point out, old versions of RPM won't be able to read or install new-format rpms. Unfortunately, I don't see any way around this. > 2. This also means the _entire_ ecosystem of RPM archive parsers will > break. This is not particularly insurmountable, actually, as the RPM > file format was not particularly well documented, and a new format is > an opportunity to revisit some of those old issues and try to do a > better job this go around. But it's still a challenge to deal with. Yes, this is going to be quite a bit of work. > 3. When you refer to the rpm cpio, I assume you're referring to only > the archive payload, right? Typically the payload is what is > compressed, and the headers are not. It sounds like you're proposing > both aspects to be compressed, and compressed differently. If we made > the RPM header an uncompressed zchunk stream and the RPM payload a > zstd-compressed zchunk stream, would we be able to support fetching > header deltas for retrieving extra information on the fly? Say, for > example, attributes like arch color, filecap properties, and so on, > that aren't in the rpm-md data for things like transaction tests > without the whole RPM? Yes, I'm referring the the archive payload as the cpio. The zchunk format supports the idea of separate data streams, and I was planning to use that to put the headers in one stream and the archive payload in another. If the header chunks are first in the zchunk file, then they could be read without needing to read any of the rest of the file. And, yes, we could make the header stream uncompressed if that made it easier to parse. > 4. I'd actually rather make it easier for the header streams to be > fetched instead of trying to make specific attributes easier in the > header payload. History has shown that any attempt at foresight here > tends to fail miserably, and common attributes are already specified > in the rpm-md primary.xml anyway, so if you're fetching the header to > retrieve an attribute, you *need* to do something weird anyway. The main purpose of putting separate attributes in the zchunk header is so programs like 'file' can determine some basic information about an rpm without needing to parse the full rpm header. This data would also be in the rpm header, so programs that read the rpm header wouldn't care about the attributes in the zchunk header. > 5. I'm not exactly sure what you mean by zchunk signing... The zchunk format supports signing, but just for the zchunk header. Because the header contains the checksums for each chunk, this establishes a chain of trust for verifying the whole file. Which brings me to... > 6. I'm wondering why we can't do a perfect reconstruction of the > original RPM, given two RPM sources that are both zchunked? We can > pull it off with repodata, so what's different about RPM that makes > that not doable? The problem is that, unlike the repodata, once an rpm is installed, the package file is deleted and the data is only available on the system in its uncompressed installed form. If we're trying to use that data to rebuild an rpm, we have two options. 1. Compress the data using the same method that was used to create the original rpm. This is what applydeltarpm does, and is why it's so heavy on the CPU. 2. Store the data uncompressed in the rebuilt rpm. This isn't feasible with deltarpm, but, if we store both compressed hashes and uncompressed hashes in the zchunk header, we can do this in zchunk. When running checking the signature, zchunk verifies the header against the signature first, and then checks each chunk to see if it passes *either* the compressed or uncompressed signature check. I hope this makes my thought process on this part clearer. Jonathan _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx