On Sun, 2021-01-03 at 16:16 -0500, Colin Walters wrote: > > On Sat, Jan 2, 2021, at 10:03 AM, Zbigniew Jędrzejewski-Szmek wrote: > > > I fail to see why this would be significantly better... > > I don't claim that the "separate temporary directory of unpacked > content" is *better* - just that it's as easy to implement *and* > doesn't require an RPM format change (with all the consequent pain) > or support for reflinks from the underlying filesystem. > > > The logic to > > handle the split rpm contents would seem to be more complicated > > than the > > rewrite with /usr/bin/rpm2extents. Other comments? > > Hard to really say for sure I guess without trying to write > both. Probably the biggest impediment is that changes like that > would end up needing to be split across the librpm + zypper/rpm- > ostree/dnf tools. It wasn't an accident really that for rpm-ostree > /usr/bin/rpm is read-only - we effectively squash those layers > togther and can thus make deep changes as a single unit. > > Anyways, none of this really *requires* reflinks in any way and so > calling the Change "RPMCoW" is misleading from that > perspective. "DnfParallelUnpack" would probably be a better title, > with a dependency on "RPMFormatCowReady" or something. And then my > point is that one could do "DnfParallelUnpack" without changing the > RPM format without much more complexity, if any. Early on in this project I looked at creating all the files during download in a temporary directory. It would work. It is more filesystem type agnostic. If moving the decompression to an earlier step were the sole goal, it's reasonable. The goal of RPMCoW is to write once, and re-use data multiple times. This comes up in a number of circumstances for this proposal: 1. Reflinking allows for de-duplication of file content. Today this is only within a single RPM. I am looking at changing rpm2extents to reuse data across (cached) rpms to achieve something kind of like delta rpm. That is: if you already have file X, you don't write it, you clone it from any other rpm. 2. Reflinking allows sharing of file contents, without side effects from the installed copy. Each copy is a real, distinct file, can be deleted and or modified. Only the differences cost something, and 99% of rpms files don't get modified. The net result is that the rpm cache costs very little. 3. If you can keep a rpm cache, you can reuse the data very quickly, either to build a new rootfs in a subdir/subvolume with the same or different packages, and you can use those files for containers. This sounds similar to using snapshots, but with snapshots you're operating on a filesystem at a time, and you can only go backwards. Here you can decide what you want, and you get maximum reuse automatically. By contrast "DnfParallelUnpack" by itself, without CoW, is less useful because you will need to re-fetch and re-decompress data. Lastly, I'd like to emphasize that I'm not trying to change the "normal rpm format". Doing so would orphan every previously built and signed rpm, and would present a serious backward compatibility problem. I aim to only change how they're downloaded and stored in the cache, locally, and consumed in rpm itself within the confines of hosts that (can) enable this. - Matthew _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx