Re: [PATCH v5 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2015-10-16 09:12, Christoph Hellwig wrote:
On Fri, Oct 16, 2015 at 08:50:41AM -0400, Austin S Hemmelgarn wrote:
Certain parts of userspace do try to reflink things instead of copying (for
example, coreutils recently started doing so in mv and has had the option to
do so with cp for a while now), but a properly designed general purpose
filesystem does not and should not do this without the user telling it to do
so.

But they do.  Get out of your narrow local Linux file system view.
Every all flash array or hyperconverge hypervisor will dedeup the hell
out of your data, heck some SSDs even do it on the device.  Your NFS or
CIFS server already does or soon will do dedup and reflinks behind the
scenes, that's the whole point of adding these features to the protocol.
Unless things have significantly changed on Windows and OS X, NTFS and HFS+ do not do automatic data deduplication (I'm not sure whether either even supports reflinks, although NTFS is at least partly COW), and I know for certain that FAT, UDF, Minix, BeFS, and Venti do not do so. NFS and CIFS/SMB both have support in the protocol, but unless either the client asks for it specifically, or the server is manually configured to do it automatically (although current versions of Windows server might do it by default, but if they do it is not documented anywhere I've seen), they don't do it. 9P has no provisions for reflinks/deduplication. AFS/Coda/Ceph/Lustre/GFS2 might do deduplication, but I'm pretty certain that they do not do so by default, and even then they really don't fit the 'general purpose' bit in my statement above. So, overall, my statement still holds for any widely used filesystem technology that is actually 'general purpose'.

Furthermore, if you actually read my statement, you will notice that I only said that _filesystems_ should not do it without being told to do so, and (intentionally) said absolutely nothing about any kind of storage devices or virtualization. Ideally, SSD's really shouldn't do it either unless they have a 100% guarantee that the entire block going bad will not render the data unrecoverable (most do in fact use ECC internally, but they typically only handle two or three bad bits out of a full byte). And as far as hypervisors go, a good storage hypervisor should be providing some guarantee of reliability, which means either it is already storing multiple copies of _everything_ or using some form of erasure coding so that it can recover from issues with the underlying storage devices without causing issues for higher levels, thus meaning that deduplication in that context is safe for all intents and purposes.
And except for the odd fear or COW or dedup, and the ENOSPC issue for
which we have a flag with a very well defined meaning I've still not
heard any good arguments against it.
Most people who I know who demonstrate this fear are just fine with COW, it's the deduplication that they're terrified of, and TBH that's largely because they've only ever seen it used in unsafe ways. My main argument (which I admittedly have not really stated properly at all during this discussion) is that almost everyone is likely to jump on this, which _will_ change long established semantics in many things that switch to this, and there will almost certainly be serious backlash from that.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux