On Sat, 2018-11-17 at 14:36 -0500, Neal Gompa wrote: > On Sat, Nov 17, 2018 at 1:15 PM Jonathan Dieter <jdieter@xxxxxxxxx> wrote: > > Neal, thanks so much for your thoughts on this. Responses inline: > > > > On Sat, 2018-11-17 at 09:53 -0500, Neal Gompa wrote: > > <snip> > > > If we're really considering changing the RPM file format, then we need > > > a proper discussion on rpm-maint@ and rpm-ecosystem@ mailing lists on > > > rpm.org. Can you please start a targeted discussion there? > > > > Sure. > > > > > But addressing the specific concrete suggestion here, there's a few > > > concerns I have: > > > > > > 1. This is a huge format break, which means that for the first time in > > > a _very_ long time, it would not be possible to reuse RHEL for Fedora > > > infrastructure _at all_. That's going to be a difficult problem. > > > There's a large legacy of systems that won't be able to handle that > > > new format, and unfortunately, rpm is not parallel installable in the > > > same manner as something like GCC or Python currently. Making it > > > parallel installable *is* possible (I've done it, and there have been > > > other attempts before), but it's not a supported thing. This is > > > probably the thing that would trigger a major version bump for RPM, > > > since it's a new archive format. > > > > Agreed, that this would be a massive format change and should therefore > > be a major version bump for RPM. New versions of RPM should still be > > able to read and install old-format rpms, but, as you point out, old > > versions of RPM won't be able to read or install new-format rpms. > > Unfortunately, I don't see any way around this. > > > > I don't think there's a way around it either. I just hope we do better > than the last time someone tried to do this... +1 > > > 2. This also means the _entire_ ecosystem of RPM archive parsers will > > > break. This is not particularly insurmountable, actually, as the RPM > > > file format was not particularly well documented, and a new format is > > > an opportunity to revisit some of those old issues and try to do a > > > better job this go around. But it's still a challenge to deal with. > > > > Yes, this is going to be quite a bit of work. > > > > > 3. When you refer to the rpm cpio, I assume you're referring to only > > > the archive payload, right? Typically the payload is what is > > > compressed, and the headers are not. It sounds like you're proposing > > > both aspects to be compressed, and compressed differently. If we made > > > the RPM header an uncompressed zchunk stream and the RPM payload a > > > zstd-compressed zchunk stream, would we be able to support fetching > > > header deltas for retrieving extra information on the fly? Say, for > > > example, attributes like arch color, filecap properties, and so on, > > > that aren't in the rpm-md data for things like transaction tests > > > without the whole RPM? > > > > Yes, I'm referring the the archive payload as the cpio. The zchunk > > format supports the idea of separate data streams, and I was planning > > to use that to put the headers in one stream and the archive payload in > > another. If the header chunks are first in the zchunk file, then they > > could be read without needing to read any of the rest of the file. > > And, yes, we could make the header stream uncompressed if that made it > > easier to parse. > > > > Whether it's compressed or not isn't terribly important, but what is > important is being able to validate the correctness before beginning > any processing, including decompression. Absolutely! This includes both the rpm header and the rpm archive data, and that's why we store both the compressed and uncompressed checksums of the chunks. > > > 4. I'd actually rather make it easier for the header streams to be > > > fetched instead of trying to make specific attributes easier in the > > > header payload. History has shown that any attempt at foresight here > > > tends to fail miserably, and common attributes are already specified > > > in the rpm-md primary.xml anyway, so if you're fetching the header to > > > retrieve an attribute, you *need* to do something weird anyway. > > > > The main purpose of putting separate attributes in the zchunk header is > > so programs like 'file' can determine some basic information about an > > rpm without needing to parse the full rpm header. This data would also > > be in the rpm header, so programs that read the rpm header wouldn't > > care about the attributes in the zchunk header. > > > > I see, so some simple hints for stuff like that? But that would still > require awareness of the format to some degree. I guess we'd have a > specific lead magic to let tools know to look for them... Yeah, the code would be maybe a hundred lines, max, that could be copylib'd into file, etc. > > > 5. I'm not exactly sure what you mean by zchunk signing... > > > > The zchunk format supports signing, but just for the zchunk header. > > Because the header contains the checksums for each chunk, this > > establishes a chain of trust for verifying the whole file. Which > > brings me to... > > > > > 6. I'm wondering why we can't do a perfect reconstruction of the > > > original RPM, given two RPM sources that are both zchunked? We can > > > pull it off with repodata, so what's different about RPM that makes > > > that not doable? > > > > The problem is that, unlike the repodata, once an rpm is installed, the > > package file is deleted and the data is only available on the system in > > its uncompressed installed form. If we're trying to use that data to > > rebuild an rpm, we have two options. > > > > 1. Compress the data using the same method that was used to create the > > original rpm. This is what applydeltarpm does, and is why it's so > > heavy on the CPU. > > 2. Store the data uncompressed in the rebuilt rpm. This isn't feasible > > with deltarpm, but, if we store both compressed hashes and > > uncompressed hashes in the zchunk header, we can do this in zchunk. > > When running checking the signature, zchunk verifies the header > > against the signature first, and then checks each chunk to see if it > > passes *either* the compressed or uncompressed signature check. > > > > I hope this makes my thought process on this part clearer. > > > > Yeah, that makes sense... Great! Thanks again for looking at this. Jonathan _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx