On Thu, Oct 28 2021, Eric Wong wrote: > "core.fsync" and the "GIT_FSYNC" environment variable now exist > for disabling fsync() even on packfiles and other critical data. First off, I think this would be useful to add, even as a non-test thing. There's plenty of use-cases where users don't care about fsync(), and aside from data reliability it also has implications for say laptop battery use. I care about my E-Mail, but my E-Mail syncing thingy's (mbsync) has an fsync=off in its config, I'm pretty confident that I won't lose power on that *nix laptop, and even if I did I have other recovery methods. I wouldn't use this in general, but might for some operations, e.g. when I'm doing some batch operation with git-annex or whatever. > Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system > on an HDD, adding "GIT_FSYNC=0" to the invocation brings my test > runtime from ~4 minutes down to ~3 minutes. On a related topic: Have you tried having it use an empty template directory? I have some unsubmitted patches for that, and if you retain all trash dirs you'll find that we have IIRC 100-200MB of just accumulated *.sample etc. hooks, out of ~1GB total for the tests. > +core.fsync:: > + A boolean to control 'fsync()' on all files. > ++ > +Setting this to false can speed up writes of unimportant data. > +Disabling fsync will lead to data loss on power failure. If set > +to false, this also overrides core.fsyncObjectFiles. Defaults to true. > + Per the recent threads about fsync semantics this should really be worded to be more scary, i.e. it's not guaranteed that your data is on-disk at all. So subsequent git operations might see repo corruption, if we're being POSIX-pedantic. So some more "all bets are off" wording would be better here, and any extra promises could be e.g.: On commonly deployed OS's such as Linux, OSX etc. a lack of fsync() gives you extra guarantees that are non-standard, primarily that a subsequent running process that accesses the data will see the updated version, as the VFS will serve up the new version, even if it's not on-disk yet. Even these light guarantees may not be something you're getting from our OS. Here be dragons! > [...] > --- a/git-cvsserver.perl > +++ b/git-cvsserver.perl > @@ -3607,6 +3607,25 @@ package GITCVS::updater; > use strict; > use warnings; > use DBI; > +our $_use_fsync; s/our/my/? > +# n.b. consider using Git.pm > +sub use_fsync { > + if (!defined($_use_fsync)) { > + my $x = $ENV{GIT_FSYNC}; > + if (defined $x) { > + local $ENV{GIT_CONFIG}; > + delete $ENV{GIT_CONFIG}; > + my $v = ::safe_pipe_capture('git', '-c', "core.fsync=$x", > + qw(config --type=bool core.fsync)); > + $_use_fsync = defined($v) ? ($v eq "true\n") : 1; > + } else { > + my $v = `git config --type=bool core.fsync`; > + $_use_fsync = $v eq "false\n" ? 0 : 1; > + } > + } > + $_use_fsync; > +} I wonder most of all if we really need these perl changes, does it really matter for the overall performance that you want, or was it just to get all fsync() uses out of the test suite? This is a more general comment, but so much of scaffolding like that in the *.perl scripts could go away if we taught git.c some "not quite a builtin, but it's ours" mode, and had say code for git-send-email, git-svn etc. to just set the various data they need in the environment before we invoke them. Then this would just be say: our $use_fsync = $ENV{GIT_FOR_CVSSERVER_FSYNC}; > + if ($self->{dbdriver} eq 'SQLite' && !use_fsync()) { > + $self->{dbh}->do('PRAGMA synchronous = OFF'); > + } ..in particular if we're doing this we should update the docs to say that we'll also tweak third-party software's use of fsync() in some cases. Even if we "own" that sqlite DB a user might not expect a git option to have gone so far as to be the cause of their on-disk sqlite DB's corruption. > [...] > my $sync; > # both of these options make our .rev_db file very, very important > # and we can't afford to lose it because rebuild() won't work > - if ($self->use_svm_props || $self->no_metadata) { > + if (($self->use_svm_props || $self->no_metadata) && use_fsync()) { > require File::Copy; > $sync = 1; > File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ", This doesn't just impact $io->sync, but also $io->flush, which is a different thing. PerlIO flushing to the OS != fsync(). So even on OS's that make the above noted guarantees you'd get nothing, since we're now at the mercy of perl not crashing in the interim, not kernel or stdlib behavior. > @@ -57,7 +58,9 @@ void fprintf_or_die(FILE *f, const char *fmt, ...) > > void fsync_or_die(int fd, const char *msg) > { > - while (fsync(fd) < 0) { > + if (use_fsync < 0) > + use_fsync = git_env_bool("GIT_FSYNC", 1); > + while (use_fsync && fsync(fd) < 0) { > if (errno != EINTR) > die_errno("fsync error on '%s'", msg); > } This patch direction looks good to me, aside from the above we should really update any other fsync() docs we have, maybe with a cross-reference in core.fsyncObjectFiles? I'm not sure offhand what the state of the other fsync patches that Neeraj Singh has been submitting is, but let's make sure that whatever docs/config knobs etc. we have all play nicely together, and are somewhat future-proof if possible.