Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 21, 2020 at 12:49 PM Colin Walters <walters@xxxxxxxxxx> wrote:
>
>
>
> On Mon, Dec 21, 2020, at 11:28 AM, Ben Cotton wrote:
> >
> >
> >
> > == Summary ==
> >
> > RPM Copy on Write provides a better experience for Fedora Users as it
> > reduces the amount of I/O and offsets CPU cost of package
> > decompression. RPM Copy on Write uses reflinking capabilities in
> > btrfs, which is the default filesystem in Fedora 33.
>
> A bunch of points here:
>
> - No, it's the default for one Edition.  Others don't default to it.  And even for Workstation we can't *require* it because it's definitely supported to use other filesystems and storage layouts.
>
> - Orthogonal to this, I'd also note that xfs supports reflinks too.
>
> Combining those I'd say instead e.g.: "Most Fedora Editions default to a filesystem that support reflinks, e.g. btrfs or xfs" (actually I think IoT defaults to ext4 for...probably they didn't consider it?)
>

It'd be more accurate to say most Fedora variants default to Btrfs.
The only exceptions right now are Cloud, Server, and CoreOS. But yes,
Fedora Server's current default of XFS on LVM means it also supports
reflinks.

As an aside, I *really* hate this split of terminology we have among
Editions, Spins, and Labs. It's confusing to everyone. :(

> - When talking about RPMs we need to think about container images, which use overlayfs by default, which defers to the underlying filesystem for reflinks - so should be fine, but should be explicitly written down (and tested)
>
> - Generally incompatible RPM payload changes cause pain proportional to how far they're "not backported", e.g. if support for this isn't in Fedora N-1 (e.g. Fedora 32) it will be harder for current Koji/mock model.  Nowadays many more people use podman than mock, which e.g. if using a RHEL8 host will naturally avoid the dependency on an updated RPM.  But
>

Incomplete statement here?

That said, we don't have a problem in the Koji/Mock model anymore, as
bootstrap mode is now activated. Additionally, Mock uses
systemd-nspawn by default for all cases except for with Koji (which
overrides this because it can't handle nspawn mode at the moment).

> > # Decompression happens inline with download.
>
> rpm-ostree does this by default today BTW (rpms are unpacked into local ostree commits in parallel even).
>
> > ## Regular RPMs use a compressed .cpio based payload. In contrast,
> > extent based RPMs contain uncompressed data aligned to the fundamental
> > page size of the architecture, e.g. 4KiB on x86_64. This alignment is
> > required for <code>FICLONERANGE</code> to work. Only files are
> > represented in the payload, other directory entries like symlinks,
> > device nodes etc are constructed entirely from rpm header information.
>
> This is the core change; some interesting tradeoffs here.  Python projects in particular ship a lot of files smaller than 4k (classic example is `__init__.py` which is zero sized).  And ppc64le is 64KiB pages right?  So there will be "zero space" to align, right?  Would need some math to see how much this would add up to, although I guess the implementation could instead use holes?
>
> > Files are referenced by their digest, so identical files are
> > de-duplicated.
>
> But just inside a single RPM, right?  It's interesting to compare with ostree which does this by default; conceptually this is using reflinks inside a single RPM to do what ostree does system wide with hardlinks.
>
> BTW we learned a few things, notably zero sized files are tricky because there can be a *lot* of them - see e.g. https://github.com/ostreedev/ostree/pull/2197
> That one was too many hardlinks, but how well do filesystems like btrfs/xfs handle thousands of reflinks instead?  The Python __init__.py thing is such a pathological case...
>
> > # Disk space requirements are expected to be marginally higher than
> > before: all new packages or updates will consume their installed size
> > before installation instead of about half their size (regular rpms
> > with payloads still cost space).
>
> This won't matter much for small updates but could be quite noticeable for larger system upgrades.
>
> This all said the more I think about this, wouldn't it be way simpler to change rpm to support a "temporary root directory", e.g. `/usr/.rpmtemp` or whatever.  Then dnf/zypper/etc cam do the unpack-and-download model without any format changes to RPM - instead of reflinking it'd just be rename() into place. This is effectively what rpm-ostree is doing today except with ostree commits instead of a temporary directory.

Sure, this makes some degree of sense, but it doesn't reduce the IOPS
for actually *doing* the installation. My understanding is that this
Change is intended to reduce the thrashing when doing package
transactions.

This is also a flaw with RPM-OSTree, since you have to fetch
everything individually and construct the root by shifting hardlinks
or reflinks around.





--
真実はいつも一つ!/ Always, there's only one truth!
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux