On Thu, Nov 04 2021, Neeraj Singh wrote: > On Thu, Nov 04, 2021 at 02:24:26PM -0700, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: >> >> > I think it would probably be best to create a git_fsync_fd() function >> > which is non-fatal and has that config/while loop, and have >> > fsync_or_die() be a "..or die()" wrapper around that, then you could >> > call that git_fsync_fd() here. >> >> Adding git_fsync_fd() smells like a poor taste, as git_fsync() takes >> an fd already. How about making the current one into a static helper >> >> -int git_fsync(int fd, enum fsync_action action) >> +static int git_fsync_once(int fd, enum fsync_action action) >> ... >> >> and then hide the looping behavior behind git_fsync() proper? >> >> int git_fsync(int fd, enum fsync_action action) >> { >> while (git_fsync_once(fd, action) < 0) >> if (errno != EINTR) >> return -1; >> return 0; >> } >> >> fsync_or_die() can be simplified by getting rid of its loop. >> >> None of that needs to block Patrick's work, I think. A version that >> uses raw fsync() and punts on EINTR can graduate first, which makes >> the situation better than the status quo, and all the ongoing work >> that deal with fsync can be extended with an eye to make it also >> usable to replace the fsync() call Patrick's fix adds. > > Is there some reason we shouldn't die if writing the ref fails? We are > already accustomed to dying if fsyncing a packfile or the index fails. > > I assume the number of refs updated is not that high on any given git > operation, so it's not worth having a batch mode for this eventually. Others have mostly touched on this, but usually (always?) we're about to die anyway, but by not calling die() here we'll die with a less crappy message. I.e. this bit: strbuf_addf(err, [...] Is effectively a deferred die() in most (all?) cases, and be passed up to the ref update code. because instead of a nondescript message like: fsync: failed somewhere We want: couldn't update ref xyz to OID abc: fsync failed Or similar. Hrm, actually having written the above there's a really good reason not to die, which is that fsync can return ENOSPC. So if we're in the middle of a transaction and have created and written the lockfile we might only notice that the disk has is full when we do the fsync(). In that case we'll (or should, I didn't check just now) unroll the ref transaction, and delete the *.lock files we created, which presumably will succeed in that scenario. So calling die() at this level is the difference between leaving the repo in an inconsistent state due to a disk error, and something like "git fetch --atomic" gracefully failing.