Re: RFC: A configuration design for future-proofing fsync() configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 11, 2021 at 9:54 PM Christoph Hellwig <hch@xxxxxx> wrote:
>
> On Wed, Nov 10, 2021 at 04:47:24PM -0800, Neeraj Singh wrote:
> > It would be nice to loop in some Linux fs developers to find out what can be
> > done on current implementations to get the durability without terrible
> > performance. From reading the docs and mailing threads it looks like the
> > sync_file_range + bulk fsync approach should actually work on the current XFS
> > implementation.
>
> If you want more than just my advice linux-fsdevel@xxxxxxxxxxxxxxx is
> a good place to find a wide range of opinions.
>
> Anyway, I think syncfs is the biggest band for the buck as it will give
> you very efficient syncing with very little overhead in git, but it does
> have a huge noisy neighbor problem that might make it unattractive
> for multi-tenant file systems or git hosting.

To summarize where we are at for linux-fsdevel:
We're working on making Git preserve data added to the repo even if
the system crashes or loses power at some point soon after a Git
command completes. The default behavior of git-for-windows is to set
core.fsyncobjectfiles=true, which at least ensures durability for
loose object files.

The current implementation of core.fsyncobjectfiles inserts an fsync
between writing each new object to a temp name and renaming it to its
final hash-based name. This approach is slow when adding hundreds of
files to the repo [1]. The main cost on the hardware we tested is
actually the CACHE_FLUSH request sent down to
the storage hardware. There is also work in-flight by Patrick
Steinhardt to sync ref files [2].

In a patch series at [3], I implemented a batch mode that issues
pagecache writeback for each object file when it's being written and
then before any of the files are renamed to their final destination we
do an fsync to a dummy file on the same filesystem.  On linux, this is
using the sync_file_range(fd,0,0,  SYNC_FILE_RANGE_WRITE_AND_WAIT) to
do the pagecache writeback.  According to Amir's thread at [4] this
flag combo should actually trigger the desired writeback. The
expectation is that the fsync of the dummy file should trigger a log
writeback and one or more CACHE_FLUSH commands to harden the block
mapping metadata and directory entries such that the data would be
retrievable after the fsync completes.

The equivalent sequence is specified to work on the common Windows
filesystems [5]. The question I have for the Linux community is
whether the same sequence will work on any of the common extant Linux
filesystems such that it can provide value to Git users on Linux. My
understanding from Christoph Hellwig's comments is that on XFS at
least the sync_file_range, fsync, and rename sequence would allow us
to guarantee that the complete written contents of the file would be
visible if the new name is visible.  I also expect that additional
fsync to a dummy file after the renames would also ensure that the log
is forced again, which should ensure that all of the renames are
visible before a ref file could be written that points at one of the
object names.

I wasn't able to find any clear semantics about the ext4 filesystem,
and I gather from what I've read that the btrfs filesystem does not
support the desired semantics.  Christoph mentioned that syncfs would
efficiently provide a batched CACHE_FLUSH with the cost of picking up
dirty cached data unrelated to Git.

Are there any opinions on the Linux side about what APIs we should use
to provide durability across multiple Git files while not completely
tanking performance by adding one CACHE_FLUSH per file modified?  What
are the semantics of the ext4 log (when it is enabled) with regards to
creating a temp file, populating its contents and then renaming it?
Are they similar enough to XFS's 'log force' such that our batch mode
would work there?

Thanks,
Neeraj
Windows Core Filesystem Dev

[1] https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117
[2] https://lore.kernel.org/git/cover.1636544377.git.ps@xxxxxx/
[3] https://lore.kernel.org/git/b9d3d87443266767f00e77c967bd77357fe50484.1633366667.git.gitgitgadget@xxxxxxxxx/
[4] https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@xxxxxxxxx/
[5] See FLUSH_FLAGS_NO_SYNC -
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux