On Mon, Mar 21 2022, Neeraj Singh wrote: [Don't have time for a full reply, sorry, just something quick] > On Mon, Mar 21, 2022 at 9:52 AM Ævar Arnfjörð Bjarmason > [...] >> So, my question is still why the temporary object dir migration part of >> this is needed. >> >> We are writing N loose object files, and we write those to temporary >> names already. >> >> AFAIKT we could do all of this by doing the same >> tmp/rename/sync_file_range dance on the main object store. >> > > Why not the main object store? We want to maintain the invariant that any > name in the main object store refers to a file that durably has the > correct contents. > If we do sync_file_range and then rename, and then crash, we now have a file > in the main object store with some SHA name, whose contents may or may not > match the SHA. However, if we ensure an fsync happens before the rename, > a crash at any point will leave us either with no file in the main > object store or > with a file that is durable on the disk. Ah, I see. Why does that matter? If the "bulk" mode works as advertised we might have such a corrupt loose or pack file, but we won't have anything referring to it as far as reachability goes. I'm aware that the various code paths that handle OID writing don't deal too well with it in practice to say the least, which one can try with say: $ echo foo | git hash-object -w --stdin 45b983be36b73c0788dc9cbcb76cbb80fc7bb057 $ echo | sudo tee .git/objects/45/b983be36b73c0788dc9cbcb76cbb80fc7bb057 I.e. "fsck", "show" etc. will all scream bloddy murder, and re-running that hash-object again even returns successful (we see it's there already, and think it's OK). But in any case, I think it would me much easier to both review and reason about this code if these concerns were split up. I.e. things that want no fsync at all (I'd think especially so) might want to have such updates serialized in this manner, and as Junio pointed out making these things inseparable as you've done creates API concerns & fallout that's got nothing to do with what we need for the performance gains of the bulk checkin fsyncing technique, e.g. concurrent "update-index" consumers not being able to assume reported objects exist as soon as they're reported. >> Then instead of the "bulk_fsync" cookie file don't close() the last file >> object file we write until we issue the fsync on it. >> >> But maybe this is all needed, I just can't understand from the commit >> message why the "bulk checkin" part is being done. >> >> I think since we've been over this a few times without any success it >> would really help to have some example of the smallest set of syscalls >> to write a file like this safely. I.e. this is doing (pseudocode): >> >> /* first the bulk path */ >> open("bulk/x.tmp"); >> write("bulk/x.tmp"); >> sync_file_range("bulk/x.tmp"); >> close("bulk/x.tmp"); >> rename("bulk/x.tmp", "bulk/x"); >> open("bulk/y.tmp"); >> write("bulk/y.tmp"); >> sync_file_range("bulk/y.tmp"); >> close("bulk/y.tmp"); >> rename("bulk/y.tmp", "bulk/y"); >> /* Rename to "real" */ >> rename("bulk/x", x"); >> rename("bulk/y", y"); >> /* sync a cookie */ >> fsync("cookie"); >> > > The '/* Rename to "real" */' and '/* sync a cookie */' steps are > reversed in your above sequence. It should be Sorry. > 1: (for each file) > a) open > b) write > c) sync_file_range > d) close > e) rename in tmp_objdir -- The rename step is not required for > bulk-fsync. An earlier version of this series didn't do it, but > Jeff King pointed out that it was required for concurrency: > https://lore.kernel.org/all/YVOrikAl%2Fu5%2FVi61@xxxxxxxxxxxxxxxxxxxxxxx/ Yes we definitely need the rename, I was wondering about why we needed it 2x for each file, but that was answered above. >> And I'm asking why it's not: >> >> /* Rename to "real" as we go */ >> open("x.tmp"); >> write("x.tmp"); >> sync_file_range("x.tmp"); >> close("x.tmp"); >> rename("x.tmp", "x"); >> last_fd = open("y.tmp"); /* don't close() the last one yet */ >> write("y.tmp"); >> sync_file_range("y.tmp"); >> rename("y.tmp", "y"); >> /* sync a cookie */ >> fsync(last_fd); >> >> Which I guess is two questions: >> >> A. do we need the cookie, or can we re-use the fd of the last thing we >> write? > > We can re-use the FD of the last thing we write, but that results in a > tricker API which > is more intrusive on callers. I was originally using a lockfile, but > found a usage where > there was no lockfile in unpack-objects. Ok, so it's something we could do, but passing down 2-3 functions to object-file.c was a hassle. I tried to hack that up earlier and found that it wasn't *too bad*. I.e. we'd pass some "flags" about our intent, and amend various functions to take "don't close this one" and pass up the fd (or even do that as a global). In any case, having the commit message clearly document what's needed for what & what's essential & just shortcut taken for the convenience of the current implementation would be really useful. Then we can always e.g. change this later to just do the the fsync() on the last of N we write. [Ran out of time, sorry]