On 2018-11-16, Jonathan Dieter wrote:
For reference, this is in reply to Paul [Frield]'s email about lifecycle objectives, specifically focusing on problem statement #1[1]. <tl;dr> Have rpm use zchunk as its compression format, removing the need for deltarpms, and thus reducing compose time. This will require changes to both the rpm format and new features in the zchunk format. </tl;dr>
[1]: https://fedoraproject.org/wiki/Objectives/Lifecycle/Problem_statements#Challenge_.231:_Faster.2C_more_scalable_composes
Currently a compose takes a minimum of around 8.5 hours ([1] and others); the goal is 1 hour. The goal is particularly relevant during the last phase of a Fedora release cycle (after code freeze) when each successive compose contains only a few .rpms that have changed from the previous compose, and the question-of-the-hour is whether some particular bug actually was fixed. In this case deltarpms can be ignored. The goal also is relevant to a future of CI (Continuous Integration) that has automated gating of changes depending on successful tests of the entire compose ("Does it boot and pass the test cases?") Again, deltarpms can be ignored. Please display some measurements which support the belief that using zchunk will reduce compose time dramatically, whether by eliminating deltarpms or by other effects. Did you view https://www.youtube.com/watch?v=kW7oz_zbSD0 "Flock 2018 - Improving Fedora Compose process" (Aug.8, 2018; 55min) They do present measurements [coarse]. The overwhelming conclusion is that 8.5 hours is a data flow problem, both large-grain (moving .rpms and other files across the network) and small-grain (extracting the desired information from the header of an .rpm that uses data compression.) The number one request that I heard in the recorded session was for faster access to fields in the header of an .rpm that uses data compression. This is slow today because the header+tail are compressed together as if a single logical stream, and the code retrieves and de-compresses the whole .rpm in order to access just the header. However, both xz (liblzma) and gzip (zlib) accept a parameter to stop decompressing after generating N bytes of output; why not use this? N can be known, or over-estimated, or iteratively (and incrementally) approximated until it covers the entire header. To make de-compression of the rpm header even easier, call xz_compress twice: once with the header, once with the tail. The concatenation of the compressed outputs is transparent by default but visible if you look for it, just like zlib. In effect the "directory" feature of zchunk can be implemented for the special case of header-vs-tail (using either xz (liblzma) or gzip (zlib)) without disturbing other clients of .rpms. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx