Re: [PATCH v5 04/14] core.fsyncmethod: batched disk flushes for loose-objects

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 30 Mar 2022 10:37:35 -0700

"Neeraj Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 9da3e5d88f6..3c90ba0b395 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -596,6 +596,14 @@ core.fsyncMethod::
>  * `writeout-only` issues pagecache writeback requests, but depending on the
>    filesystem and storage hardware, data added to the repository may not be
>    durable in the event of a system crash. This is the default mode on macOS.
> +* `batch` enables a mode that uses writeout-only flushes to stage multiple
> +  updates in the disk writeback cache and then does a single full fsync of
> +  a dummy file to trigger the disk cache flush at the end of the operation.
> ++
> +  Currently `batch` mode only applies to loose-object files. Other repository
> +  data is made durable as if `fsync` was specified. This mode is expected to
> +  be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems
> +  and on Windows for repos stored on NTFS or ReFS filesystems.

Does this format correctly?  I had an impression that the second and
subsequent paragraphs, connected with a line with a single "+" on
it, has to be flushed left without indentation.

> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index 8b0fd5c7723..9799d247cad 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -3,15 +3,20 @@
>   */
>  #include "cache.h"
>  #include "bulk-checkin.h"
> +#include "lockfile.h"
>  #include "repository.h"
>  #include "csum-file.h"
>  #include "pack.h"
>  #include "strbuf.h"
> +#include "string-list.h"
> +#include "tmp-objdir.h"
>  #include "packfile.h"
>  #include "object-store.h"
>  
>  static int odb_transaction_nesting;
>  
> +static struct tmp_objdir *bulk_fsync_objdir;

I wonder if this should be added to the bulk_checkin_state structure
as a new member, especially if we fix the erroneous call to
finish_bulk_checkin() as a preliminary fix-up of a bug that existed
even before this series.

> +/*
> + * Cleanup after batch-mode fsync_object_files.
> + */
> +static void do_batch_fsync(void)
> +{
> +	struct strbuf temp_path = STRBUF_INIT;
> +	struct tempfile *temp;
> +
> +	if (!bulk_fsync_objdir)
> +		return;
> +
> +	/*
> +	 * Issue a full hardware flush against a temporary file to ensure
> +	 * that all objects are durable before any renames occur. The code in
> +	 * fsync_loose_object_bulk_checkin has already issued a writeout
> +	 * request, but it has not flushed any writeback cache in the storage
> +	 * hardware or any filesystem logs. This fsync call acts as a barrier
> +	 * to ensure that the data in each new object file is durable before
> +	 * the final name is visible.
> +	 */
> +	strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
> +	temp = xmks_tempfile(temp_path.buf);
> +	fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
> +	delete_tempfile(&temp);
> +	strbuf_release(&temp_path);
> +
> +	/*
> +	 * Make the object files visible in the primary ODB after their data is
> +	 * fully durable.
> +	 */
> +	tmp_objdir_migrate(bulk_fsync_objdir);
> +	bulk_fsync_objdir = NULL;
> +}

OK.

> +void prepare_loose_object_bulk_checkin(void)
> +{
> +	/*
> +	 * We lazily create the temporary object directory
> +	 * the first time an object might be added, since
> +	 * callers may not know whether any objects will be
> +	 * added at the time they call begin_odb_transaction.
> +	 */
> +	if (!odb_transaction_nesting || bulk_fsync_objdir)
> +		return;
> +
> +	bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
> +	if (bulk_fsync_objdir)
> +		tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
> +}

OK.  If we got a failure from tmp_objdir_create(), then we don't
swap and end up creating a new loose object file in the primary
object store.  I wonder if we at least want to note that fact for
later use at "unplug" time.  We may create a few loose objects in
the primary object store without any fsync, then a later call may
successfully create a temporary object directory and we'd create
more loose objects in the temporary one, which are flushed with the
"create a dummy and fsync" trick and migrated, but do we need to do
something to the ones we created in the primary object store before
all that happens?

> +void fsync_loose_object_bulk_checkin(int fd, const char *filename)
> +{
> +	/*
> +	 * If we have an active ODB transaction, we issue a call that
> +	 * cleans the filesystem page cache but avoids a hardware flush
> +	 * command. Later on we will issue a single hardware flush
> +	 * before as part of do_batch_fsync.
> +	 */
> +	if (!bulk_fsync_objdir ||
> +	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) {
> +		fsync_or_die(fd, filename);
> +	}
> +}

Ah, if we have successfully created the temporary directory, we
don't do full fsync but just writeout-only one, so there is no need
for the worry I mentioned in the previous paragraph.  OK.

> @@ -301,4 +370,6 @@ void end_odb_transaction(void)
>  
>  	if (bulk_checkin_state.f)
>  		finish_bulk_checkin(&bulk_checkin_state);
> +
> +	do_batch_fsync();
>  }

OK.

> @@ -1961,6 +1963,9 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  	static struct strbuf tmp_file = STRBUF_INIT;
>  	static struct strbuf filename = STRBUF_INIT;
>  
> +	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
> +		prepare_loose_object_bulk_checkin();
> +
>  	loose_object_path(the_repository, &filename, oid);
>  
>  	fd = create_tmpfile(&tmp_file, filename.buf);

The necessary change to the "workhorse" code path is surprisingly
small, which is pleasing to see.

Thanks.