Re: xfsdump does not support reflink copied files properly

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 3 Nov 2023 08:47:03 +1100

On Thu, Nov 02, 2023 at 09:39:53AM -0700, Darrick J. Wong wrote:
> On Thu, Nov 02, 2023 at 01:42:54PM +0100, Alexander Puchmayr wrote:
> > Hi there,
> > 
> > I just encountered a problem when trying to use xfsdump on a filesystem with 
> > lots of reflink copied vm disk images, yielding a dump file much larger than 
> > expected and which I also was unable to restore from (target disk full). I 
> > created a gentoo bug item under https://bugs.gentoo.org/916704 and I got 
> > advised to report it here as well.
> > 
> > Copy from the bug report:
> > 
> > sys-fs/xfsdump-3.1.12 seems to copy reflink copied files as ordinary files, 
> > resulting in a way too big dump file. Restoring from such a dump yields likely 
> > a out-of-diskspace condition. 
> 
> Correct, xfsdump (and tar, and rsync...) does not know how to preserve
> the sharing factor of a particular space extent.  All of those tools
> walk the inodes on a filesystem, open them, and read() out the data.
> 
> Although there are ways to find out which file(s) own a piece of disk
> space, each of those tools would most likely require a thorough redesign
> to the dump file format to allow pointing to shared blocks elsewhere in
> the dump file.

I don't think that is the case. Like XFS, xfsdump encodes user data
it backs up in extent records, and it has different types of
extents. It currently understands "data" and "hole" extents as
returned by XFS_IOC_GETBMAPX, so we could extend that to encode
"shared" extents that point to an offset and length in a different
inode.

Yes, this means during the scan we have to record all shared extents
with their underlying block number, then after the scan we need to
resolve that to the single copy we are going to keep ias a normal
data extent in the dump (i.e. the first to be restored) Then we
convert all the others to the new shared extent type that points at
the {ino, off, len} that contains the actual data in the dump.

Now all restore needs to do is run FICLONERANGE when it comes across
a shared extent - it's got all the info it needs in the dump to
recreated the shared extent. We can use restore side ordering to
guarantee that the data we need to clone is already on disk (e.g.
delay extent clones until after all the normal data has been
restored) so that all the shared extents we restore end up with the
correct data in them.

Yes, this means we need to bump the dump format version number to
support shared extents, but overall it's not a major revision of the
format or major surgery to the code base.  It doesn't require kernel
or even XFS expertise to implement - it's all userspace stuff and
fairly straight forward - it just requires time, resources and
commitment.

> Regardless, nobody's submitted code to do any of those things.  Patches
> welcome.

Yup, that is the biggest issue - there's always more things to do
that we have people to do them.

> > It may be used as a denial-of-service tool which can be used by an ordinary 
> 
> Please do not file a  ^^^^^^^^^^^^^^^^^ CVE for this.

/me sighs

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx