On Wed, Oct 18, 2023 at 10:59:10 AM +1100, Dave Chinner wrote: > On Tue, Oct 17, 2023 at 01:12:08PM -0700, Catherine Hoang wrote: >> One of our VM cluster management products needs to snapshot KVM image >> files so that they can be restored in case of failure. Snapshotting is >> done by redirecting VM disk writes to a sidecar file and using reflink >> on the disk image, specifically the FICLONE ioctl as used by >> "cp --reflink". Reflink locks the source and destination files while it >> operates, which means that reads from the main vm disk image are blocked, >> causing the vm to stall. When an image file is heavily fragmented, the >> copy process could take several minutes. Some of the vm image files have >> 50-100 million extent records, and duplicating that much metadata locks >> the file for 30 minutes or more. Having activities suspended for such >> a long time in a cluster node could result in node eviction. >> >> Clone operations and read IO do not change any data in the source file, >> so they should be able to run concurrently. Demote the exclusive locks >> taken by FICLONE to shared locks to allow reads while cloning. While a >> clone is in progress, writes will take the IOLOCK_EXCL, so they block >> until the clone completes. >> >> Link: https://lore.kernel.org/linux-xfs/8911B94D-DD29-4D6E-B5BC-32EAF1866245@xxxxxxxxxx/ >> Signed-off-by: Catherine Hoang <catherine.hoang@xxxxxxxxxx> >> --- >> fs/xfs/xfs_file.c | 67 +++++++++++++++++++++++++++++++++++--------- >> fs/xfs/xfs_inode.c | 17 +++++++++++ >> fs/xfs/xfs_inode.h | 9 ++++++ >> fs/xfs/xfs_reflink.c | 4 +++ >> 4 files changed, 84 insertions(+), 13 deletions(-) > > All looks OK - one minor nit below. > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > >> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c >> index 203700278ddb..3b9500e18f90 100644 >> --- a/fs/xfs/xfs_file.c >> +++ b/fs/xfs/xfs_file.c >> @@ -214,6 +214,47 @@ xfs_ilock_iocb( >> return 0; >> } >> >> +static int >> +xfs_ilock_iocb_for_write( >> + struct kiocb *iocb, >> + unsigned int *lock_mode) >> +{ >> + ssize_t ret; >> + struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); >> + >> + ret = xfs_ilock_iocb(iocb, *lock_mode); >> + if (ret) >> + return ret; >> + >> + if (*lock_mode == XFS_IOLOCK_EXCL) >> + return 0; >> + if (!xfs_iflags_test(ip, XFS_IREMAPPING)) >> + return 0; >> + >> + xfs_iunlock(ip, *lock_mode); >> + *lock_mode = XFS_IOLOCK_EXCL; >> + ret = xfs_ilock_iocb(iocb, *lock_mode); >> + if (ret) >> + return ret; >> + >> + return 0; > > This last bit could simply be: > > xfs_iunlock(ip, *lock_mode); > *lock_mode = XFS_IOLOCK_EXCL; > return xfs_ilock_iocb(iocb, *lock_mode); > Catherine, I have made the modifications suggested above and I have committed the patch onto my local Git tree. -- Chandan