On 2022-03-16 15:23, Dave Chinner wrote:
reflink is not dedupe. file clones simply make a copy by reference, so it doesn't duplicate the data in the first place. IOWs, it ends up with a single physical copy that has multiple references to it. dedupe is done by a different operation, which requires comparing the data in two different locations and if they are the same reducing it to a single physical copy with multiple references.
Yeah sorry I didn't phrase that statement right but I understand the situation.
IIUC, you are asking about whether you can run a reflink copy on the destination before you run rsync, then do a delta sync using rsync to only move the changed blocks, so only store the changed blocks in the backup image? If so, then yes. This is how a reflink-based file-level backup farm would work. It is very similar to a hardlink based farm, but instead of keeping a repository of every version of the every file that is backed up in an object store and then creating the directory structure via hardlinks to the object store, it creates the new directory structure with reflink copies of the previous version and then does delta updates to the files directly.
ok thanks
I haven't confirmed anything, just made a guess same as you have.
Well good enough for me thanks anyway!
That sounds more like the dedupe process searching for duplicate blocks to dedupe....
I think so too.
You can use FIEMAP (filefrag(1) or xfs_bmap(8)) to tell you if a specific extent is shared or not. But it cannot tell you how many references there are to it, nor what file those references belong to. For that, you need root permissions, ioctl_getfsmap(2) and rmapbt=1 support in your filesystem.
Sounds more complex than I would like to deal with.
Unless you have an immediate use for filesystem metadata level introspection (generally unlikely), there's no need to enable it.
ok thanks for the info. I am leaving the list now, thanks a bunch for the replies. nate