Re: Reflink (cow) copy of busy files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 26, 2018 at 10:23:45PM +0100, Gionatan Danti wrote:
> Il 26-02-2018 18:26 Darrick J. Wong ha scritto:
> >The way reflink is supposed to work wrt consistency is:
> >
> >1. lock out all new io/fallocate activity on both inodes (iolock/mmaplock)
> >2. wait for all directio to complete
> >3. fsync both files (write all the dirty pagecache to disk)
> >4. lock both inodes (ilock)
> >5. clone each extent atomically
> >6. unlock ilock
> >7. unlock iolock/mmaplock
> >
> >So at least in theory the cloned file will match whatever the host saw
> >on disk and page cache at the time the reflink call was initiated.
> >I say 'in theory' because there could be bugs.
> 
> Great! CoW will be a great addition for XFS when it will be considered
> stable.
> 
> >Whatever dirty state is in the guest VM stays in that VM, which means
> >that if you only cp --reflink on the host, the clone you get will
> >reflect the virtual disk state as if you'd kill -9'd the VM, cloned the
> >VM disk, and restarted the VM.  Upon restart the log recovers whatever
> >metadata made it out of the VM.
> 
> Sure, it is what I means for "crash-consistent".
> 
> >However, if you tell the guest to freeze the fs before cloning (as Dave
> >suggested earlier) the guest will flush all its state to the upper level
> >(the host) and the host will push all that out to disk before cloning.
> >The snapshot you create should be cleaner because you're effectively
> >prepaying the recovery costs by flushing everything before taking the
> >snapshot.
> 
> True, and this is "application-level consistency" (which requires a guest
> agent and possibly even an application-specific agent)

I believe qemu-ga takes care of guest fs freeze inside the guest,
and you can invoke it from the host via 'virsh domfsfreeze' or the
--quiesce argument to snapshot-create... but you ought to confirm that
for yourself.

--D

> >Also note that if the host goes down before returning from the syscall,
> >the log will continue on with whichever extent was being cloned at the
> >time in order to preserve metadata integrity, but the destination file
> >will reflect a partial copy.
> 
> Thanks for pointing that, and for your extremely clear explanation!
> 
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux