For reference, this is in reply to Paul's email about lifecycle objectives, specifically focusing on problem statement #1[1]. <tl;dr> Have rpm use zchunk as its compression format, removing the need for deltarpms, and thus reducing compose time. This will require changes to both the rpm format and new features in the zchunk format. </tl;dr> *deltarpm background* As part of the compose process, deltarpms are generated between each new rpm and both the GA version of the rpm and the previous version. This process is very CPU and memory intensive, especially for large rpms. This also means that deltarpms are only useful for an end user if they are either updating from GA or have been diligent about keeping their system up-to-date. If a user is updating a package from N-2 to N, there will be no deltarpm and the full rpm will be downloaded. *zchunk background* As some are aware, I've been working on zchunk[2], a compression format that's designed for highly efficient deltas, and using it minimize metadata downloads[3]. The core idea behind zchunk is that a file is split into independently compressed chunks and the checksum of each compressed chunk is stored in the zchunk header. When downloading a new version of the file, you download the zchunk header first, check which chunks you already have, and then download the rest. *Proposal* My proposal would be to make zchunk the rpm compression format for Fedora. This would involve a few additions to the zchunk format[4] (something the format has been designed to accommodate), and would require some changes to the rpm file format. *Benefit* The benefit of zchunked rpms is that, when downloading an updated rpm, you would only need to download the chunks that have changed from what's on your system. The uncompressed local chunks would be combined with the downloaded compressed chunks to create a local rpm that will pass signature verification without needing to recompress the uncompressed local chunks, making this computationally much faster than rebuilding a deltarpm, a win for users. The savings wouldn't be as good as what deltarpm can achieve, but deltarpms would be redundant and could be removed, completely eliminating a large step from the compose process. *Drawbacks* 1. Downloading a new release of a zchunked rpm would be larger than downloading the equivalent deltarpm. This is offset by the fact that the client is able to work out which chunks it needs no matter what the original rpm is, rather than needing a specific original rpm as deltarpm does. 2. The rebuilt rpm may not be byte-for-byte identical to the original, but will be able to be validated without decompression, as explained in the next section *Changes* The zchunk format would need to be extended to allow for a zchunked rpm to contain both the uncompressed chunks that were already on the local system and the newly downloaded compressed chunks while still passing signature verification. This would also require moving signature verification to zchunk. The rpm file format has to be changed because the zchunk header needs to be at the beginning of the file in order for the zchunk library figure out which chunks it needs to download. My suggestions for changes to the rpm file format are as follows: 1. Signing should be moved to the zchunk format as described at the beginning of this section 2. The rpm header should be stored in one stream inside the zchunk file. This allows it to be easily extracted separately from the data 3. The rpm cpio should be stored in a second stream inside the zchunk file. 4. At minimum, an optional zchunk element should be set to identify zchunk rpms as rpms rather than regular zchunk files. If desired, optional elements could also be set containing %{name}, %[version}, %{release}, %{arch} and %{epoch}. This would allow this information to be read easily without needing to extract the rpm header stream. *Final notes* I realize this is a massive proposal, zchunk is still very young, and we're still working on getting the dnf zchunk pull requests reviewed. I do think it's feasible and provides an opportunity to eliminate a pain point from our compose process while still reducing the download size for our users. [1]: https://fedoraproject.org/wiki/Objectives/Lifecycle/Problem_statements#Challenge_.231:_Faster.2C_more_scalable_composes [2]: https://github.com/zchunk/zchunk [3]: https://fedoraproject.org/wiki/Changes/Zchunk_Metadata [4]: https://github.com/zchunk/zchunk/blob/master/zchunk_format.txt _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx