On Thu, Mar 22, 2018 at 5:49 PM, John Reiser <jreiser@xxxxxxxxxxxx> wrote: > On 03/22/2018 01:51 PM, Nico Kadel-Garcia wrote: >> >> On Thu, Mar 22, 2018 at 10:52 AM, John Reiser <jreiser@xxxxxxxxxxxx> >> wrote: >>> >>> On 03/22/2018 05:40 AM, Daniel Mach wrote: >>>> >>>> >>>> We are pleased to announce that development of DNF 3 has started. This >>>> version is focused on performance improvements, new API and >>>> consolidating >>>> the whole software management stack. >>> >>> >>> >>> How does RPM fit into DNF's view of "the whole software management >>> stack"? >>> RPM is a slug (moves very slowly): no parallelism (at any point all >>> packages >>> with no remaining predecessors could be updated/installed in parallel), >>> not even manually pipelined (decompress to memory, manipulate filesystem, >>> update database.) >> >> >> Parallelizing software updates or installations would be *begging* for >> pain. It would be difficult for me to recommend strongly enough >> against this. > > > Please be specific about the pain points that you fear. RPM, itself, is single threaded. %pre and %post operations would have to be re-evaluated for parallelization. system account creation, in particular, would have to be made thread safe. RPM installation can fail partly through deployment due to SELinux, disk space, or network based mount point failure: keeping it single threaded makes it much safer to unravel failed or partial RPM installation. Unweaving partial dependency deployment could be quite destructive with a parallelized approach. Daemons that need to be restarted and may have incompatible component updates, such httpd with its modules, are particularly vulnerable to fascinating failures from the daemon restarting with only some updated components. Avoiding that would seem to require even more dependency management for RPM installation, rather than each update itself triggering an update. > The three-stage "manual" pipeline achieves 2x to 3x faster throughput > with error states that are isomorphic to present RPM. (Consider the > Turning machine model: if you don't write to the filesystem, then > there is no change of external state.) Turing machines don't have to deal with all the possible > The "parallelize everything that has no remaining predecessors" strategy > requires parallel transactions in the database (they cannot interfere > because that would be a predecessor constraint) and checking for > resource exhaustion (file space, inodes, etc.) as a global > predecessor constraint. What else? Parallelizing the installations means losing the milestones at which one update has succeeded, and the second update has not. Unweaving that to find out which update triggered the failure sounds like pain, and makes testing the update process more difficult. It becomes difficult to manage or guess what the state of the system was at the time of the RPM update, since another RPM update may be in progress at the time. There is an infamous quote by Donald Knuth that "premature optimization is the root of all evil". There are systems that benefit the time benefits of parallelization, but for ordinary RPM installations and system updates, I think that the slow update time is because of other factors, such as disk IO and download time of repodata, RPM database updates, and download times for the packages. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx