Re: XFS reflink overhead, ioctl(FICLONE)

Suyash Mahar <smahar@xxxxxxxx> · Tue, 13 Dec 2022 20:47:03 -0800

Hi Darrick,

Thank you for the response. I have replied inline.

-Suyash

Le mar. 13 déc. 2022 à 09:18, Darrick J. Wong <djwong@xxxxxxxxxx> a écrit :
>
> [ugh, your email never made it to the list.  I bet the email security
> standards have been tightened again.  <insert rant about dkim and dmarc
> silent failures here>] :(
>
> On Sat, Dec 10, 2022 at 09:28:36PM -0800, Suyash Mahar wrote:
> > Hi all!
> >
> > While using XFS's ioctl(FICLONE), we found that XFS seems to have
> > poor performance (ioctl takes milliseconds for sparse files) and the
> > overhead
> > increases with every call.
> >
> > For the demo, we are using an Optane DC-PMM configured as a
> > block device (fsdax) and running XFS (Linux v5.18.13).
>
> How are you using fsdax and reflink on a 5.18 kernel?  That combination
> of features wasn't supported until 6.0, and the data corruption problems
> won't get fixed until a pull request that's about to happen for 6.2.

We did not enable the dax option. The optane DIMMs are configured to
appear as a block device.

$ mount | grep xfs
/dev/pmem0p4 on /mnt/pmem0p4 type xfs
(rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Regardless of the block device (the plot includes results for optane
and RamFS), it seems like the ioctl(FICLONE) call is slow.

> > We create a 1 GiB dense file, then repeatedly modify a tiny random
> > fraction of it and make a clone via ioctl(FICLONE).
>
> Yay, random cow writes, that will slowly increase the number of space
> mapping records in the file metadata.
>
> > The time required for the ioctl() calls increases from large to insane
> > over the course of ~250 iterations: From roughly a millisecond for the
> > first iteration or two (which seems high, given that this is on
> > Optane and the code doesn't fsync or msync anywhere at all, ever) to 20
> > milliseconds (which seems crazy).
>
> Does the system call runtime increase with O(number_extents)?  You might
> record the number of extents in the file you're cloning by running this
> periodically:
>
> xfs_io -c stat $path | grep fsxattr.nextents

The extent count does increase linearly (just like the ioctl() call latency).
I used the xfs_bmap tool, let me know if this is not the right way. If
it is not, I'll update the microbenchmark to run xfs_io.

> FICLONE (at least on XFS) persists dirty pagecache data to disk, and
> then duplicates all written-space mapping records from the source file to
> the destination file.  It skips preallocated mappings created with
> fallocate.
>
> So yes, the plot is exactly what I was expecting.
>
> --D
>
> > The plot is attached to this email.
> >
> > A cursory look at the extent map suggests that it gets increasingly
> > complicated resulting in the complexity.
> >
> > The enclosed tarball contains our code, our results, and some other info
> > like a flame graph that might shed light on where the ioctl is spending
> > its time.
> >
> > - Suyash & Terence