> On Mon, Dec 21, 2020, at 1:07 PM, Neal Gompa wrote: > > Yes it does. It avoids writing the compressed data and then copying it back out > uncompressed, which is the same amount of savings as the reflink approach. > > (It's also equally incompatible with deltarpm) > > > No - static deltas exist, plus layered RPMs work on the wire the same. But this isn't > really relevant here. > > > Adding a hardlink indeed requires updating inodes proportional to the number of files, but > that's more an implementation of the transactional update approach, not of the > "download and unpack in parallel" part which is more what we're discussing > here. (Though they are entangled a bit) > > Anyways, I'd still stand by my summary that the much lower tech "files in > temporary directory that get rename()d" approach would be all of *more* efficient on > disk, simpler to implement and much less disruptive than an RPM format change. (The main > cost would be a new temporary directory path that would need cleanup as part of e.g. `yum > clean` etc.) I'm replying to a bunch of topics in the same thread (via the web ui because I wasn't subscribed to the mailing list until today, yikes) On editions: I wrote fedora-workstation because that's the same one that has btrfs as root by default Zero byte files: I think reflinking is specifically fine here because reflinking is about contents, not inodes. A zero byte reflink should be a no-op (on the filesystem level, but I should check, if it's not, I can special case it easily enough). The process of installing files based on reflinks involves actually opening new files, then reflinking content. On small files and alignment/waste: I believe most mutable filesystems do "waste some space". I call it out here because it's explicitly in the file format, the same as in .tar (without compression) and it's because FICLONERANGE and the filesystems demand it. I account for it as (number of files) x (native block size) / 2 - i.e. assume 50% usage of the tail of every file. The block size of ppc64 is unfortunate, but I expect the same level of waste happens whether you're using reflinking or not. Talking about the topic more broadly: The hardlinking approach in rpm-ostree depends on either a completely read-only system, or the use of a layered filesystem like overlayfs. I think it's a completely valid approach, and to my understanding, is the technology that underpins Fedora CoreOS and Project Atomic. These are different distro builds and have specific use cases in mind. As I understand it, they also have very different management policies: they are intended to be managed in a specific way, and that updates seem to require a reboot. My hope for CoW for RPM is to bring a similar set of capabilities and benefits to Fedora, and eventually CentOS, RHEL without requiring any changes to how the system works or is managed. The new requirements are fairly simple: one filesystem for the rootfs and dnf cache, and that this filesystems supports reflinking. Today data deduplication is within a given rpm. Looking forwards, I would like to extend the rpm2extents processor to read and re-use other blocks from the dnf/rpm cache and then we get full system level de-duplication. I am really grateful for all this feedback, hopefully what I write makes sense - Matthew. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx