On Wed, Jan 17 2018, Junio C. Hamano jotted: > Christoph Hellwig <hch@xxxxxx> writes: > >> fsync is required for data integrity as there is no gurantee that >> data makes it to disk at any specified time without it. Even for >> ext3 with data=ordered mode the file system will only commit all >> data at some point in time that is not guaranteed. > > It comes from this one: > > commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef > Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Date: Wed Jun 18 15:18:44 2008 -0700 > > Add config option to enable 'fsync()' of object files > > As explained in the documentation[*] this is totally useless on > filesystems that do ordered/journalled data writes, but it can be a > useful safety feature on filesystems like HFS+ that only journal the > metadata, not the actual file contents. > > It defaults to off, although we could presumably in theory some day > auto-enable it on a per-filesystem basis. > > [*] Yes, I updated the docs for the thing. Hell really _has_ frozen > over, and the four horsemen are probably just beyond the horizon. > EVERYBODY PANIC! > >> diff --git a/Documentation/config.txt b/Documentation/config.txt >> index 0e25b2c92..9a1cec5c8 100644 >> --- a/Documentation/config.txt >> +++ b/Documentation/config.txt >> @@ -866,10 +866,8 @@ core.whitespace:: >> core.fsyncObjectFiles:: >> This boolean will enable 'fsync()' when writing object files. >> + >> -This is a total waste of time and effort on a filesystem that orders >> -data writes properly, but can be useful for filesystems that do not use >> -journalling (traditional UNIX filesystems) or that only journal metadata >> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). >> +This option is enabled by default and ensures actual data integrity >> +by calling fsync after writing object files. > > I am somewhat sympathetic to the desire to flip the default to > "safe" and allow those who know they are already safe to tweak the > knob for performance, and it also makes sense to document that the > default is "true" here. But I do not see the point of removing the > four lines from this paragraph; the sole effect of the removal is to > rob information from readers that they can use to decide if they > want to disable the configuration, no? [CC'd the author of the current behavior] Some points/questions: a) Is there some reliable way to test whether this is needed from userspace? I'm thinking something like `git update-index --test-untracked-cache` but for fsync(). b) On the filesystems that don't need this, what's the performance impact? I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered on the tests I thought might do a lot of loose object writes: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh [...] Test fsync-on~ fsync-on ------------------------------------------------------------------------------------------------------- 3400.2: rebase on top of a lot of unrelated changes 1.45(1.30+0.17) 1.45(1.28+0.20) +0.0% 3400.4: rebase a lot of unrelated changes without split-index 4.34(3.71+0.66) 4.33(3.69+0.66) -0.2% 3400.6: rebase a lot of unrelated changes with split-index 3.38(2.94+0.47) 3.38(2.93+0.47) +0.0% 0007.2: write_locked_index 3 times (3214 files) 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% No impact. However I did my own test of running the test suite 10% times with/without this patch, and it runs 9% slower: fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95 fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88 Test script at the end of this E-Mail. c) What sort of guarantees in this regard do NFS-mounted filesystems commonly make? Test script: use v5.10.0; use strict; use warnings; use Time::HiRes qw(time); use List::Util qw(sum); use Data::Dumper; my %time; for my $ref (@ARGV) { system "git checkout $ref"; system qq[make -j56 CFLAGS="-O3 -g" NO_OPENSSL=Y all]; for (1..10) { my $t0 = -time(); system "(cd t && NO_SVN_TESTS=1 GIT_TEST_HTTPD=0 prove -j56 --state=slow,save t[0-9]*.sh)"; $t0 += time(); push @{$time{$ref}} => $t0; } } for my $ref (sort keys %time) { printf "%20s: avg:%.2f %s\n", $ref, sum(@{$time{$ref}})/@{$time{$ref}}, join(" ", map { sprintf "%.02f", $_ } sort { $a <=> $b } @{$time{$ref}}); }