Re: [PATCH] refs: sync loose refs to disk before committing them

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 05 Nov 2021 09:35:24 +0100

On Thu, Nov 04 2021, Neeraj Singh wrote:

> On Thu, Nov 04, 2021 at 02:24:26PM -0700, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>> 
>> > I think it would probably be best to create a git_fsync_fd() function
>> > which is non-fatal and has that config/while loop, and have
>> > fsync_or_die() be a "..or die()" wrapper around that, then you could
>> > call that git_fsync_fd() here.
>> 
>> Adding git_fsync_fd() smells like a poor taste, as git_fsync() takes
>> an fd already.  How about making the current one into a static helper
>> 
>> 	-int git_fsync(int fd, enum fsync_action action)
>> 	+static int git_fsync_once(int fd, enum fsync_action action)
>> 	 ...
>> 
>> and then hide the looping behavior behind git_fsync() proper?
>> 
>>         int git_fsync(int fd, enum fsync_action action)
>>         {
>>                 while (git_fsync_once(fd, action) < 0)
>>                         if (errno != EINTR)
>>                                 return -1;
>>                 return 0;
>>         }
>> 
>> fsync_or_die() can be simplified by getting rid of its loop.
>> 
>> None of that needs to block Patrick's work, I think.  A version that
>> uses raw fsync() and punts on EINTR can graduate first, which makes
>> the situation better than the status quo, and all the ongoing work
>> that deal with fsync can be extended with an eye to make it also
>> usable to replace the fsync() call Patrick's fix adds.
>
> Is there some reason we shouldn't die if writing the ref fails? We are
> already accustomed to dying if fsyncing a packfile or the index fails.
>
> I assume the number of refs updated is not that high on any given git
> operation, so it's not worth having a batch mode for this eventually.

Others have mostly touched on this, but usually (always?) we're about to
die anyway, but by not calling die() here we'll die with a less crappy
message.

I.e. this bit:

    strbuf_addf(err, [...]

Is effectively a deferred die() in most (all?) cases, and be passed up
to the ref update code. because instead of a nondescript message like:

    fsync: failed somewhere

We want:

    couldn't update ref xyz to OID abc: fsync failed

Or similar. 

Hrm, actually having written the above there's a really good reason not
to die, which is that fsync can return ENOSPC.

So if we're in the middle of a transaction and have created and written
the lockfile we might only notice that the disk has is full when we do
the fsync().

In that case we'll (or should, I didn't check just now) unroll the ref
transaction, and delete the *.lock files we created, which presumably
will succeed in that scenario.

So calling die() at this level is the difference between leaving the
repo in an inconsistent state due to a disk error, and something like
"git fetch --atomic" gracefully failing.