Re: [PATCH 2/7] core.fsyncmethod: batched disk flushes for loose-objects

Neeraj Singh <nksingh85@xxxxxxxxx> · Wed, 16 Mar 2022 11:21:56 -0700

On Wed, Mar 16, 2022 at 12:31 AM Patrick Steinhardt <ps@xxxxxx> wrote:
>
> On Tue, Mar 15, 2022 at 09:30:54PM +0000, Neeraj Singh via GitGitGadget wrote:
> > From: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
> > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> > index 062e5259905..c041ed33801 100644
> > --- a/Documentation/config/core.txt
> > +++ b/Documentation/config/core.txt
> > @@ -628,6 +628,11 @@ core.fsyncMethod::
> >  * `writeout-only` issues pagecache writeback requests, but depending on the
> >    filesystem and storage hardware, data added to the repository may not be
> >    durable in the event of a system crash. This is the default mode on macOS.
> > +* `batch` enables a mode that uses writeout-only flushes to stage multiple
> > +  updates in the disk writeback cache and then a single full fsync to trigger
> > +  the disk cache flush at the end of the operation. This mode is expected to
> > +  be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems
> > +  and on Windows for repos stored on NTFS or ReFS filesystems.
>
> This mode will not be supported by all parts of our stack that use our
> new fsync infra. So I think we should both document that some parts of
> the stack don't support batching, and say what the fallback behaviour is
> for those that don't.
>

Can do. I'm hoping that you'll revive your batch-mode refs change too so that
we get batching across the ODB and Refs, which are the two data stores that
may receive many updates in a single Git command.  This documentation
comment will read:
```
* `batch` enables a mode that uses writeout-only flushes to stage multiple
  updates in the disk writeback cache and then does a single full fsync of
  a dummy file to trigger the disk cache flush at the end of the operation.
  Currently `batch` mode only applies to loose-object files. Other repository
  data is made durable as if `fsync` was specified. This mode is expected to
  be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems
  and on Windows for repos stored on NTFS or ReFS filesystems.
```

> >  core.fsyncObjectFiles::
> >       This boolean will enable 'fsync()' when writing object files.
> > diff --git a/bulk-checkin.c b/bulk-checkin.c
> > index 93b1dc5138a..5c13fe17802 100644
> > --- a/bulk-checkin.c
> > +++ b/bulk-checkin.c
> > @@ -3,14 +3,20 @@
> >   */
> >  #include "cache.h"
> >  #include "bulk-checkin.h"
> > +#include "lockfile.h"
> >  #include "repository.h"
> >  #include "csum-file.h"
> >  #include "pack.h"
> >  #include "strbuf.h"
> > +#include "string-list.h"
> > +#include "tmp-objdir.h"
> >  #include "packfile.h"
> >  #include "object-store.h"
> >
> >  static int bulk_checkin_plugged;
> > +static int needs_batch_fsync;
> > +
> > +static struct tmp_objdir *bulk_fsync_objdir;
> >
> >  static struct bulk_checkin_state {
> >       char *pack_tmp_name;
> > @@ -80,6 +86,34 @@ clear_exit:
> >       reprepare_packed_git(the_repository);
> >  }
> >
> > +/*
> > + * Cleanup after batch-mode fsync_object_files.
> > + */
> > +static void do_batch_fsync(void)
> > +{
> > +     /*
> > +      * Issue a full hardware flush against a temporary file to ensure
> > +      * that all objects are durable before any renames occur.  The code in
> > +      * fsync_loose_object_bulk_checkin has already issued a writeout
> > +      * request, but it has not flushed any writeback cache in the storage
> > +      * hardware.
> > +      */
> > +
> > +     if (needs_batch_fsync) {
> > +             struct strbuf temp_path = STRBUF_INIT;
> > +             struct tempfile *temp;
> > +
> > +             strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
> > +             temp = xmks_tempfile(temp_path.buf);
> > +             fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
> > +             delete_tempfile(&temp);
> > +             strbuf_release(&temp_path);
> > +     }
> > +
> > +     if (bulk_fsync_objdir)
> > +             tmp_objdir_migrate(bulk_fsync_objdir);
> > +}
> > +
>
> We never unset `bulk_fsync_objdir` anywhere. Shouldn't we be doing that
> when we unplug this infrastructure?
>

Will Fix.

Thanks,
Neeraj