On Mon, Jan 24, 2022 at 11:41 AM Robert-André Mauchin <zebob.m@xxxxxxxxx> wrote: > > On 1/24/22 05:14, Chris Murphy wrote: > > What file system is being used in each case? > > > > Everything is btrfs. > > > This is a bit obscure but... cp and mv use reflink=auto. On XFS and > > Btrfs this means it'll make reflinks (copies metadata, doesn't > > duplicate the data extents) if it can. Falling back to a full copy > > (metadata and data extents). > > > But both the host and the nspawn container are using btrfs? Should be true, and if this nspawn container is running on the host then they should share the same btrfs file system. And even if nspawn is creating separate subvolumes for the mock build (?not sure if it does) then because it's a nested subvolume, not mounted, there's no mount point boundary to cross so you *do* get reflink copies between subvolumes. > > It might not be possible due to an obscure VFS rule that disallows > > reflinks (for reasons I don't understand) when the copy or move > > crosses mount point boundaries. This includes bind mounts of > > directories. Bind mounts are also what are employed behind the scene > > with 'mount -o subvol' mount option on Btrfs, which we use by default > > in Fedora Workstation and Cloud Edition, and all the desktop spins. > > > > The nspawn container, I'm not super familiar with how it works. I > > think on Btrfs, it will create nested subvolumes, i.e. they are not > > mounted with the subvol mount option, hence no mount point boundary. > > But on other file systems, I think nspawn creates a loop mounted file > > system? > > > > > I've got two subvol: > > UUID=ee9eec69-8710-4503-b389-e16fcde8a0a5 / btrfs > subvol=root,compress=zstd:1 0 0 > > UUID=d7e21336-6ac6-483a-b4f2-aaeecabd8f1f /home btrfs > subvol=home,compress=zstd:1 0 0 > > but when I do my tests there is no subvol crossing, everything happens > on the root subvol? It might be there's a nested subvolume created by nspawn (I'm not sure), so maybe part of it happens in some other subvolume. But there should still be an efficient (reflink) copy. If cp or mv aren't literally invoked, and the copy is done by some library then we'd need to find out what ioctl is actually being called. For example upstream coreutils only just recently cut a new release v9.0 (only in rawhide) that has the enhancement for cp to use reflink=auto. It was previously reflink=never which is what's used most everywhere else other than Fedora. $ strace cp --reflink=always A B ... ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = 0 $ strace cp --reflink=never A B ... fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faf80f5e000 read(3, "2022/01/12 13:48:21 Starting Blu"..., 131072) = 1756 write(4, "2022/01/12 13:48:21 Starting Blu"..., 1756) = 1756 Sorry though if this is a goose chase. I can't tell if it's a factor in what's going on. But maybe someone else will find this interesting :D There is a mostly reliable way to determine if a file is a reflink copy. Before the copy, look at the file: $ filefrag -v A ... ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 14: 10103641.. 10103655: 15: last,encoded,eof ... The key take away is 10103641. Let's copy it within the same directory: $ cp A B $ filefrag -v B ... 0: 0.. 14: 10103641.. 10103655: 15: last,encoded,shared,eof ... Again 1013641. So the data extent location is the same, which is only possible with a reflink copy, and hence how reflinks go by a more technical name, shared extents. And you also see in the flags column "shared". That flag is only there because both A and B exist. If I remove A or B, such that there is only one file using those extents, they're no longer shared, so the "shared" flag won't be there. Hence my emphasis on the address. There *is* logical block address reuse in Btrfs but due to COW, it's not going to be reused less than about a minute. $ cp --reflink=never A C $ filefrag -v C ... 0: 0.. 14: 10398358.. 10398372: 15: last,encoded,eof Different location because the data extents were duplicated, not shared. This is the same on XFS too. The subtle differences maybe don't matter here much. A btrfs subvolume does have it's own st_dev, so things like rsync -x and borg will not cross subvolume boundaries. Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure