Re: Reflink (cow) copy of busy files

Gionatan Danti <g.danti@xxxxxxxxxx> · Mon, 26 Feb 2018 09:26:14 +0100

Hi Amir,

Il 26-02-2018 08:58 Amir Goldstein ha scritto:

Gionatan,

First of all, the answer to your question is "just" faster copy.
reflinkning a file is much faster than copy, but it is not O(1).
I believe cp --reflink can result in cloning part of the file if the 
system
crashes mid operation, so in any case, the operation is not *atomic*
in that sense.

But your questions about quiescence the filesystem and your question
about the *atomic* nature of the clone operation are two very different
questions.

can this result on out-of-order writes from the cloned file's point of 
view? I mean:
- take a 10-extents file;
- a vm/db/whatever is writing to the file;
- a cp --reflink is executed;
- extents are cloned one-by-one, with extents 1-4 alredy cloned, 5 is in 
progress;
- the vm/db writes to extent n.1 - this write will *not* be present on 
the cloned file;
- application writes to extent n.6 which will be cloned shortly;
- the cloned file ends with the later write to extent n.6 but not the 
previous on extent n.1;
- bad things happen!

If the above is true, than cp --reflink can't be used even for 
relaxed-consistency backup/clones.

What you seem to *think* xfs reflink does, it does not actually do.
xfs reflink does NOT reflink the file in-memory data.
xfs reflink "only" reflinks the file on-disk data.
Right now, if you write a large file without fsync and clone it, you
might as well get a clone of unallocated or partly fallocated file with
zero or stale data.

Oh, I absolutely do not expect for reflink/clone to works on in-memory 
data. I *surely* expect for dirty, not commited data to be lost: this is 
the very reason I wrote about crash-consistent backup.

In short: is cloning/reflink the same as "pulling the plug" for the 
cloned file? I mean:
- a successfull clone (so, a non-interruped/crashed one) is akin to an 
atomic process for the cloned file;
- async writes/dirty data are lost;
- fsynced writes are preserved;
- writes are not reordered/commited out of order.

Maybe the entire discussion is skewed by the fact that, in some cases, I 
am willing to relax my consistency model to include a crash-consistent 
backup option. Fact is, in the virtualization world there are many 
backup utilities/applications which *use* this model, and I wondered if 
a cp --reflink would give similar results without the hassle.

Maybe the entire crash-vs-application consistency is out of place in a 
filesystem mailing list, where you (rightfully!!!) strive for 
perfect/maximum data consistency (and I *really* appreciate that). 
Hoewever, given the recent reflinking works on XFS, I wonder if I can 
put this to "good use" when it is considered stable.

Going forward, I think there is an intention to "clone" the file 
in-memory
data as well by sharing the READONLY cache pages between cloned files,
but I don't think dirty pages are going be shared between clones 
anyway,
so you are back to square one - need to get the data on-disk before 
cloning
the file.

Great - I think this would do wonders for cache efficiency...

Cheers,
Amir.

Thanks.

PS: sorry if I rephrase the question in different terms. English is not 
my primary language, please bear with me :p

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html