Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 25, 2021 at 05:49:45PM -0700, Neeraj Singh wrote:
> Unfortunately my perusal of the man pages and documentation I could find doesn't
> give me this level of confidence on typical Linux filesystems. For
> instance, the notion of having to
> fsync the parent directory in order to render an inode's link findable
> eliminates a lot of the
> advantage of this change, though we could batch those and would have
> to do at most 256.
> 
> This thread is somewhat instructive, but inconclusive:
> https://lwn.net/ml/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@xxxxxxxxxxxxx/.

fsync/fdatasync only guarantees consistency for the file handle they
are called on.  The first linked document mentioned an implementation
artifact that file systems with metadata logging tend to force their
log out until the last modified transaction and thus force out metadata
changes done earlier.  This won't help with actual data writes at all,
as for them the fact of writing back data will often generate new
metadata changes., and in general is not a property to rely on if you
care about data integrity.  It is nice to optimize the order of the
fsync calls for metadata only workloads, as then often the later fsync
calls on earlier modified file handles will be no-ops.

> One conclusion from reviewing that thread is that as of then,
> sync_file_ranges isn't actually enough
> to make a hard guarantee about writeout occurring. See
> https://lore.kernel.org/linux-fsdevel/20190319204330.GY26298@dastard/.
> My hope is that the Linux FS developers have rectified that shortcoming by now.

I'm not sure what shortcoming you mean.  sync_file_ranges is a system
call that only causes data writeback.  It never performs metadata write
back and thus is not an integrity operation at all.  That is also very
clearly documented in the man page.

> I think my updated version of the documentation for "= false" is
> accurate and more helpful
> from a user perspective ("up to OS policy when your data becomes durable in
> the event of an unclean shutdown").  "= true" also has a reasonable
> description, though I
> might add some verbiage indicating that this setting could be costly.

Your version is much better.  In fact it almost still too nice as in
general it will not be durable and you do end up with a corrupted
repository in that case.  Note that even for bad old ext3 that was
usually the case.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux