On Wed, Mar 16, 2022 at 12:31 AM Patrick Steinhardt <ps@xxxxxx> wrote: > > On Tue, Mar 15, 2022 at 09:30:54PM +0000, Neeraj Singh via GitGitGadget wrote: > > From: Neeraj Singh <neerajsi@xxxxxxxxxxxxx> > > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt > > index 062e5259905..c041ed33801 100644 > > --- a/Documentation/config/core.txt > > +++ b/Documentation/config/core.txt > > @@ -628,6 +628,11 @@ core.fsyncMethod:: > > * `writeout-only` issues pagecache writeback requests, but depending on the > > filesystem and storage hardware, data added to the repository may not be > > durable in the event of a system crash. This is the default mode on macOS. > > +* `batch` enables a mode that uses writeout-only flushes to stage multiple > > + updates in the disk writeback cache and then a single full fsync to trigger > > + the disk cache flush at the end of the operation. This mode is expected to > > + be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems > > + and on Windows for repos stored on NTFS or ReFS filesystems. > > This mode will not be supported by all parts of our stack that use our > new fsync infra. So I think we should both document that some parts of > the stack don't support batching, and say what the fallback behaviour is > for those that don't. > Can do. I'm hoping that you'll revive your batch-mode refs change too so that we get batching across the ODB and Refs, which are the two data stores that may receive many updates in a single Git command. This documentation comment will read: ``` * `batch` enables a mode that uses writeout-only flushes to stage multiple updates in the disk writeback cache and then does a single full fsync of a dummy file to trigger the disk cache flush at the end of the operation. Currently `batch` mode only applies to loose-object files. Other repository data is made durable as if `fsync` was specified. This mode is expected to be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or ReFS filesystems. ``` > > core.fsyncObjectFiles:: > > This boolean will enable 'fsync()' when writing object files. > > diff --git a/bulk-checkin.c b/bulk-checkin.c > > index 93b1dc5138a..5c13fe17802 100644 > > --- a/bulk-checkin.c > > +++ b/bulk-checkin.c > > @@ -3,14 +3,20 @@ > > */ > > #include "cache.h" > > #include "bulk-checkin.h" > > +#include "lockfile.h" > > #include "repository.h" > > #include "csum-file.h" > > #include "pack.h" > > #include "strbuf.h" > > +#include "string-list.h" > > +#include "tmp-objdir.h" > > #include "packfile.h" > > #include "object-store.h" > > > > static int bulk_checkin_plugged; > > +static int needs_batch_fsync; > > + > > +static struct tmp_objdir *bulk_fsync_objdir; > > > > static struct bulk_checkin_state { > > char *pack_tmp_name; > > @@ -80,6 +86,34 @@ clear_exit: > > reprepare_packed_git(the_repository); > > } > > > > +/* > > + * Cleanup after batch-mode fsync_object_files. > > + */ > > +static void do_batch_fsync(void) > > +{ > > + /* > > + * Issue a full hardware flush against a temporary file to ensure > > + * that all objects are durable before any renames occur. The code in > > + * fsync_loose_object_bulk_checkin has already issued a writeout > > + * request, but it has not flushed any writeback cache in the storage > > + * hardware. > > + */ > > + > > + if (needs_batch_fsync) { > > + struct strbuf temp_path = STRBUF_INIT; > > + struct tempfile *temp; > > + > > + strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory()); > > + temp = xmks_tempfile(temp_path.buf); > > + fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp)); > > + delete_tempfile(&temp); > > + strbuf_release(&temp_path); > > + } > > + > > + if (bulk_fsync_objdir) > > + tmp_objdir_migrate(bulk_fsync_objdir); > > +} > > + > > We never unset `bulk_fsync_objdir` anywhere. Shouldn't we be doing that > when we unplug this infrastructure? > Will Fix. Thanks, Neeraj