Re: Proposal: Faster composes by eliminating deltarpms and using zchunked rpms instead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2018-11-21 at 14:36 +0100, Kamil Paral wrote:
> On Fri, Nov 16, 2018 at 11:13 PM Jonathan Dieter <jdieter@xxxxxxxxx> wrote:
> > For reference, this is in reply to Paul's email about lifecycle
> > objectives, specifically focusing on problem statement #1[1].
> > 
> > <tl;dr>
> > Have rpm use zchunk as its compression format, removing the need for
> > deltarpms, and thus reducing compose time.  This will require changes
> > to both the rpm format and new features in the zchunk format.
> > </tl;dr>
> 
> Hey Jonathan,
> 
> thanks for working on this. The proposed changes sound good to me.
> I'm a bit worried that zchunk is not yet a proven format, so it might
> be a good idea to use it for metadata first, see whether it works as
> expected, and then push it for RPM files. But that's for more
> technical people to judge.
> 
> I have some concrete questions, though:
> 1. I have noticed that especially with large RPMs (firefox, chrome,
> atom, game data like 0ad-data, etc), my PCs are mostly bottlenecked
> by CPU when installing them. And that's with a modern 3.5+GHz CPU.
> That's because RPM decompression runs in a single thread only, and xz
> is just unbelievably slow. I wonder, would zchunk used as an RPM
> compression algorithm improve this substantially? Can it decompress
> in multiple threads and/or does it have much faster decompression
> speeds (and how much)? I don't care about RPM size increase, but I'd
> really like to have them installed fast. (That's of course just my
> personal preference, but this also affects the speed of mock builds
> and such, so I think it's relevant.)

The zstd compression that zchunk uses internally is designed to be
faster than even gzip at decompression.  Currently zchunk is single-
threaded, but, given that each chunk is independent, making it multi-
threaded should be pretty trivial, and is on the todo list.

> 2. In our past QA efforts in Fedora, we had use cases for retrieving
> rpm header data without retrieving the actual content (the payload).
> That was for cases when we needed to check e.g. dependency issues,
> but the rpms were not placed in a repository yet (i.e. no easy access
> to their metadata) and it was slow and wasteful to download the whole
> rpm just to get the header. Will the new zchunk compression still
> make it possible to retrieve just the header without accessing all
> the payload data? (It would be great to make this accessible from
> Python and not just C, but that's a plea I should direct to rpm
> maintainers, I guess).

The zchunk format supports the concept of multiple independent streams
in a single file.  A zchunk rpm would contain two streams, the rpm
header and the rpm payload.  Since downloading a zchunk file is two
steps already (downloading the zchunk header, and then downloading the
required chunks), it should be easy enough to download only the chunks
needed for the rpm header stream.

As for a python API, I would love for zchunk to have that too, but
haven't had the time yet.

I hope that helps.

Jonathan
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux