Re: Reflink (cow) copy of busy files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 26-02-2018 18:26 Darrick J. Wong ha scritto:
The way reflink is supposed to work wrt consistency is:

1. lock out all new io/fallocate activity on both inodes (iolock/mmaplock)
2. wait for all directio to complete
3. fsync both files (write all the dirty pagecache to disk)
4. lock both inodes (ilock)
5. clone each extent atomically
6. unlock ilock
7. unlock iolock/mmaplock

So at least in theory the cloned file will match whatever the host saw
on disk and page cache at the time the reflink call was initiated.
I say 'in theory' because there could be bugs.

Great! CoW will be a great addition for XFS when it will be considered stable.

Whatever dirty state is in the guest VM stays in that VM, which means
that if you only cp --reflink on the host, the clone you get will
reflect the virtual disk state as if you'd kill -9'd the VM, cloned the
VM disk, and restarted the VM.  Upon restart the log recovers whatever
metadata made it out of the VM.

Sure, it is what I means for "crash-consistent".

However, if you tell the guest to freeze the fs before cloning (as Dave
suggested earlier) the guest will flush all its state to the upper level
(the host) and the host will push all that out to disk before cloning.
The snapshot you create should be cleaner because you're effectively
prepaying the recovery costs by flushing everything before taking the
snapshot.

True, and this is "application-level consistency" (which requires a guest agent and possibly even an application-specific agent)

Also note that if the host goes down before returning from the syscall,
the log will continue on with whichever extent was being cloned at the
time in order to preserve metadata integrity, but the destination file
will reflect a partial copy.

Thanks for pointing that, and for your extremely clear explanation!


--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux