On Thu, Nov 11, 2021 at 9:54 PM Christoph Hellwig <hch@xxxxxx> wrote: > > On Wed, Nov 10, 2021 at 04:47:24PM -0800, Neeraj Singh wrote: > > It would be nice to loop in some Linux fs developers to find out what can be > > done on current implementations to get the durability without terrible > > performance. From reading the docs and mailing threads it looks like the > > sync_file_range + bulk fsync approach should actually work on the current XFS > > implementation. > > If you want more than just my advice linux-fsdevel@xxxxxxxxxxxxxxx is > a good place to find a wide range of opinions. > > Anyway, I think syncfs is the biggest band for the buck as it will give > you very efficient syncing with very little overhead in git, but it does > have a huge noisy neighbor problem that might make it unattractive > for multi-tenant file systems or git hosting. To summarize where we are at for linux-fsdevel: We're working on making Git preserve data added to the repo even if the system crashes or loses power at some point soon after a Git command completes. The default behavior of git-for-windows is to set core.fsyncobjectfiles=true, which at least ensures durability for loose object files. The current implementation of core.fsyncobjectfiles inserts an fsync between writing each new object to a temp name and renaming it to its final hash-based name. This approach is slow when adding hundreds of files to the repo [1]. The main cost on the hardware we tested is actually the CACHE_FLUSH request sent down to the storage hardware. There is also work in-flight by Patrick Steinhardt to sync ref files [2]. In a patch series at [3], I implemented a batch mode that issues pagecache writeback for each object file when it's being written and then before any of the files are renamed to their final destination we do an fsync to a dummy file on the same filesystem. On linux, this is using the sync_file_range(fd,0,0, SYNC_FILE_RANGE_WRITE_AND_WAIT) to do the pagecache writeback. According to Amir's thread at [4] this flag combo should actually trigger the desired writeback. The expectation is that the fsync of the dummy file should trigger a log writeback and one or more CACHE_FLUSH commands to harden the block mapping metadata and directory entries such that the data would be retrievable after the fsync completes. The equivalent sequence is specified to work on the common Windows filesystems [5]. The question I have for the Linux community is whether the same sequence will work on any of the common extant Linux filesystems such that it can provide value to Git users on Linux. My understanding from Christoph Hellwig's comments is that on XFS at least the sync_file_range, fsync, and rename sequence would allow us to guarantee that the complete written contents of the file would be visible if the new name is visible. I also expect that additional fsync to a dummy file after the renames would also ensure that the log is forced again, which should ensure that all of the renames are visible before a ref file could be written that points at one of the object names. I wasn't able to find any clear semantics about the ext4 filesystem, and I gather from what I've read that the btrfs filesystem does not support the desired semantics. Christoph mentioned that syncfs would efficiently provide a batched CACHE_FLUSH with the cost of picking up dirty cached data unrelated to Git. Are there any opinions on the Linux side about what APIs we should use to provide durability across multiple Git files while not completely tanking performance by adding one CACHE_FLUSH per file modified? What are the semantics of the ext4 log (when it is enabled) with regards to creating a temp file, populating its contents and then renaming it? Are they similar enough to XFS's 'log force' such that our batch mode would work there? Thanks, Neeraj Windows Core Filesystem Dev [1] https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117 [2] https://lore.kernel.org/git/cover.1636544377.git.ps@xxxxxx/ [3] https://lore.kernel.org/git/b9d3d87443266767f00e77c967bd77357fe50484.1633366667.git.gitgitgadget@xxxxxxxxx/ [4] https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@xxxxxxxxx/ [5] See FLUSH_FLAGS_NO_SYNC - https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex