On Mon, Feb 26, 2018 at 10:23:45PM +0100, Gionatan Danti wrote: > Il 26-02-2018 18:26 Darrick J. Wong ha scritto: > >The way reflink is supposed to work wrt consistency is: > > > >1. lock out all new io/fallocate activity on both inodes (iolock/mmaplock) > >2. wait for all directio to complete > >3. fsync both files (write all the dirty pagecache to disk) > >4. lock both inodes (ilock) > >5. clone each extent atomically > >6. unlock ilock > >7. unlock iolock/mmaplock > > > >So at least in theory the cloned file will match whatever the host saw > >on disk and page cache at the time the reflink call was initiated. > >I say 'in theory' because there could be bugs. > > Great! CoW will be a great addition for XFS when it will be considered > stable. > > >Whatever dirty state is in the guest VM stays in that VM, which means > >that if you only cp --reflink on the host, the clone you get will > >reflect the virtual disk state as if you'd kill -9'd the VM, cloned the > >VM disk, and restarted the VM. Upon restart the log recovers whatever > >metadata made it out of the VM. > > Sure, it is what I means for "crash-consistent". > > >However, if you tell the guest to freeze the fs before cloning (as Dave > >suggested earlier) the guest will flush all its state to the upper level > >(the host) and the host will push all that out to disk before cloning. > >The snapshot you create should be cleaner because you're effectively > >prepaying the recovery costs by flushing everything before taking the > >snapshot. > > True, and this is "application-level consistency" (which requires a guest > agent and possibly even an application-specific agent) I believe qemu-ga takes care of guest fs freeze inside the guest, and you can invoke it from the host via 'virsh domfsfreeze' or the --quiesce argument to snapshot-create... but you ought to confirm that for yourself. --D > >Also note that if the host goes down before returning from the syscall, > >the log will continue on with whichever extent was being cloned at the > >time in order to preserve metadata integrity, but the destination file > >will reflect a partial copy. > > Thanks for pointing that, and for your extremely clear explanation! > > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it > email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx > GPG public key ID: FF5F32A8 > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html