On Sat, Nov 17, 2018 at 1:15 PM Jonathan Dieter <jdieter@xxxxxxxxx> wrote: > > Neal, thanks so much for your thoughts on this. Responses inline: > > On Sat, 2018-11-17 at 09:53 -0500, Neal Gompa wrote: > <snip> > > If we're really considering changing the RPM file format, then we need > > a proper discussion on rpm-maint@ and rpm-ecosystem@ mailing lists on > > rpm.org. Can you please start a targeted discussion there? > > Sure. > > > But addressing the specific concrete suggestion here, there's a few > > concerns I have: > > > > 1. This is a huge format break, which means that for the first time in > > a _very_ long time, it would not be possible to reuse RHEL for Fedora > > infrastructure _at all_. That's going to be a difficult problem. > > There's a large legacy of systems that won't be able to handle that > > new format, and unfortunately, rpm is not parallel installable in the > > same manner as something like GCC or Python currently. Making it > > parallel installable *is* possible (I've done it, and there have been > > other attempts before), but it's not a supported thing. This is > > probably the thing that would trigger a major version bump for RPM, > > since it's a new archive format. > > Agreed, that this would be a massive format change and should therefore > be a major version bump for RPM. New versions of RPM should still be > able to read and install old-format rpms, but, as you point out, old > versions of RPM won't be able to read or install new-format rpms. > Unfortunately, I don't see any way around this. > I don't think there's a way around it either. I just hope we do better than the last time someone tried to do this... > > 2. This also means the _entire_ ecosystem of RPM archive parsers will > > break. This is not particularly insurmountable, actually, as the RPM > > file format was not particularly well documented, and a new format is > > an opportunity to revisit some of those old issues and try to do a > > better job this go around. But it's still a challenge to deal with. > > Yes, this is going to be quite a bit of work. > > > 3. When you refer to the rpm cpio, I assume you're referring to only > > the archive payload, right? Typically the payload is what is > > compressed, and the headers are not. It sounds like you're proposing > > both aspects to be compressed, and compressed differently. If we made > > the RPM header an uncompressed zchunk stream and the RPM payload a > > zstd-compressed zchunk stream, would we be able to support fetching > > header deltas for retrieving extra information on the fly? Say, for > > example, attributes like arch color, filecap properties, and so on, > > that aren't in the rpm-md data for things like transaction tests > > without the whole RPM? > > Yes, I'm referring the the archive payload as the cpio. The zchunk > format supports the idea of separate data streams, and I was planning > to use that to put the headers in one stream and the archive payload in > another. If the header chunks are first in the zchunk file, then they > could be read without needing to read any of the rest of the file. > And, yes, we could make the header stream uncompressed if that made it > easier to parse. > Whether it's compressed or not isn't terribly important, but what is important is being able to validate the correctness before beginning any processing, including decompression. > > 4. I'd actually rather make it easier for the header streams to be > > fetched instead of trying to make specific attributes easier in the > > header payload. History has shown that any attempt at foresight here > > tends to fail miserably, and common attributes are already specified > > in the rpm-md primary.xml anyway, so if you're fetching the header to > > retrieve an attribute, you *need* to do something weird anyway. > > The main purpose of putting separate attributes in the zchunk header is > so programs like 'file' can determine some basic information about an > rpm without needing to parse the full rpm header. This data would also > be in the rpm header, so programs that read the rpm header wouldn't > care about the attributes in the zchunk header. > I see, so some simple hints for stuff like that? But that would still require awareness of the format to some degree. I guess we'd have a specific lead magic to let tools know to look for them... > > 5. I'm not exactly sure what you mean by zchunk signing... > > The zchunk format supports signing, but just for the zchunk header. > Because the header contains the checksums for each chunk, this > establishes a chain of trust for verifying the whole file. Which > brings me to... > > > 6. I'm wondering why we can't do a perfect reconstruction of the > > original RPM, given two RPM sources that are both zchunked? We can > > pull it off with repodata, so what's different about RPM that makes > > that not doable? > > The problem is that, unlike the repodata, once an rpm is installed, the > package file is deleted and the data is only available on the system in > its uncompressed installed form. If we're trying to use that data to > rebuild an rpm, we have two options. > > 1. Compress the data using the same method that was used to create the > original rpm. This is what applydeltarpm does, and is why it's so > heavy on the CPU. > 2. Store the data uncompressed in the rebuilt rpm. This isn't feasible > with deltarpm, but, if we store both compressed hashes and > uncompressed hashes in the zchunk header, we can do this in zchunk. > When running checking the signature, zchunk verifies the header > against the signature first, and then checks each chunk to see if it > passes *either* the compressed or uncompressed signature check. > > I hope this makes my thought process on this part clearer. > Yeah, that makes sense... -- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx