Re: [PATCH] refs: sync loose refs to disk before committing them

Jeff King <peff@xxxxxxxx> · Fri, 5 Nov 2021 03:07:18 -0400

On Thu, Nov 04, 2021 at 01:38:22PM +0100, Patrick Steinhardt wrote:

> When writing loose refs, we first create a lockfile, write the new ref
> into that lockfile, close it and then rename the lockfile into place
> such that the actual update is atomic for that single ref. While this
> works as intended under normal circumstences, at GitLab we infrequently
> encounter corrupt loose refs in repositories after a machine encountered
> a hard reset. The corruption is always of the same type: the ref has
> been committed into place, but it is completely empty.

Yes, I've seen this quite a bit, too, and I agree that more fsync()-ing
will probably help.

There are two things that have made me hesitate, though:

  1. This doesn't quite make the write durable. Doing that would also
     require fsyncing the surrounding directory (and so on up the tree).
     This is a much trickier problem, because we often don't even have
     open descriptors for those.

     I think we can ignore that, though. Doing an fsync() here makes
     things strictly better with respect to robustness (especially if
     you care less about "has my ref update been committed to disk" and
     more about "does this end up corrupt after a power failure").

  2. It's not clear what the performance implications will be,
     especially on a busy server doing a lot of ref updates, or on a
     filesystem where fsync() ends up syncing everything, not just the
     one file (my impression is ext3 is such a system, but not ext4).
     Whereas another solution may be journaling data and metadata writes
     in order without worrying about the durability of writing them to
     disk.

     I suspect for small updates (say, a push of one or two refs), this
     will have little impact. We'd generally fsync the incoming packfile
     and its idx anyway, so we're adding may one or two fsyncs on top of
     that. But if you're pushing 100 refs, that will be 100 sequential
     fsyncs, which may add up to quite a bit of latency. It would be
     nice if we could batch these by somehow (e.g., by opening up all of
     the lockfiles, writing and fsyncing them, and then renaming one by
     one).

     So I think we want to at least add a config knob here. That would
     make it easier to experiment, and provides an escape hatch.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 151b0056fe..06a3f0bdea 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1749,6 +1749,7 @@ static int write_ref_to_lockfile(struct ref_lock *lock,
>  	fd = get_lock_file_fd(&lock->lk);
>  	if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 ||
>  	    write_in_full(fd, &term, 1) < 0 ||
> +	    fsync(fd) < 0 ||
>  	    close_ref_gently(lock) < 0) {
>  		strbuf_addf(err,
>  			    "couldn't write '%s'", get_lock_file_path(&lock->lk));

I agree with the sentiment elsewhere that this would probably make sense
for any lockfile. That does probably make batching a bit more difficult,
though it could be layered on top (i.e., we could still fsync in the ref
code, and the lockfile fsync-ing should then become a noop).

-Peff