On Fri, Mar 11 2022, Patrick Steinhardt wrote: > [[PGP Signed Part:Undecided]] > On Thu, Mar 10, 2022 at 10:40:07PM -0800, Junio C Hamano wrote: >> Patrick Steinhardt <ps@xxxxxx> writes: >> >> > @@ -1504,6 +1513,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store, >> > oidcpy(&lock->old_oid, &orig_oid); >> > >> > if (write_ref_to_lockfile(lock, &orig_oid, 0, &err) || >> > + files_sync_loose_ref(lock, &err) || >> > commit_ref_update(refs, lock, &orig_oid, logmsg, &err)) { >> > error("unable to write current sha1 into %s: %s", newrefname, err.buf); >> > strbuf_release(&err); >> >> Given that write_ref_to_lockfile() on the success code path does this: >> >> fd = get_lock_file_fd(&lock->lk); >> if (write_in_full(fd, oid_to_hex(oid), the_hash_algo->hexsz) < 0 || >> write_in_full(fd, &term, 1) < 0 || >> close_ref_gently(lock) < 0) { >> strbuf_addf(err, >> "couldn't write '%s'", get_lock_file_path(&lock->lk)); >> unlock_ref(lock); >> return -1; >> } >> return 0; >> >> the above unfortunately does not work. By the time the new call to >> files_sync_loose_ref() is made, lock->fd is closed by the call to >> close_lock_file_gently() made in close_ref_gently(), and because of >> that, you'll get an error like this: >> >> Writing objects: 100% (3/3), 279 bytes | 279.00 KiB/s, done. >> Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 >> remote: error: could not sync loose ref 'refs/heads/client_branch': >> Bad file descriptor >> >> when running "make test" (the above is from t5702 but I wouldn't be >> surprised if this broke ALL ref updates). >> >> Just before write_ref_to_lockfile() calls close_ref_gently() would >> be a good place to make the fsync_loose_ref() call, perhaps? >> >> >> Thanks. > > Yeah, that thought indeed occurred to me this night, too. I was hoping > that I could fix this before anybody noticed ;) > > It's a bit unfortunate that we can't just defer this to a later place to > hopefully implement this more efficiently, but so be it. The alternative > would be to re-open all locked loose refs and then sync them to disk, > but this would likely be a lot more painful than just syncing them to > disk before closing it. Aside: is open/write/close followed by open/fsync/close on the same file portably guaranteed to yield the same end result as a single open/write/fsync/close? I think in practice nobody would be insane enough to implement a system to do otherwise, but on the other hand I've seen some really insane behavior :) I could see it being different e.g. in some NFS cases/configurations where the fsync() for an open FD syncs to the remote storage, and the second open() might therefore get the old version and noop-sync that. Most implementations would guard against that in the common case by having a local cache of outstanding data to flush, but if you're talking to some sharded storage array for each request... Anyway, I *think* it should be OK, just an aside to check the assumption for any future work... :)