Re: [RFC PATCH] object-file.c: batched disk flushes for stream_loose_object()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, 9 Jun 2022, Han Xin wrote:

> Neeraj Singh[1] pointed out that if batch fsync is enabled, we should still
> call prepare_loose_object_bulk_checkin() to potentially create the bulk checkin
> objdir.
>
> 1. https://lore.kernel.org/git/7ba4858a-d1cc-a4eb-b6d6-4c04a5dd6ce7@xxxxxxxxx/
>
> Signed-off-by: Han Xin <chiyutianyi@xxxxxxxxx>

I like a good commit message that is concise and yet has all the necessary
information. Well done!

> ---
>  object-file.c                   |  3 +++
>  t/t5351-unpack-large-objects.sh | 15 ++++++++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/object-file.c b/object-file.c
> index 2dd828b45b..3a1be74775 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2131,6 +2131,9 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	char hdr[MAX_HEADER_LEN];
>  	int hdrlen;
>
> +	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
> +		prepare_loose_object_bulk_checkin();
> +

Makes sense.

>  	/* Since oid is not determined, save tmp file to odb path. */
>  	strbuf_addf(&filename, "%s/", get_object_directory());
>  	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
> diff --git a/t/t5351-unpack-large-objects.sh b/t/t5351-unpack-large-objects.sh
> index 461ca060b2..a66a51f7df 100755
> --- a/t/t5351-unpack-large-objects.sh
> +++ b/t/t5351-unpack-large-objects.sh
> @@ -18,7 +18,10 @@ test_expect_success "create large objects (1.5 MB) and PACK" '
>  	test_commit --append foo big-blob &&
>  	test-tool genrandom bar 1500000 >big-blob &&
>  	test_commit --append bar big-blob &&
> -	PACK=$(echo HEAD | git pack-objects --revs pack)
> +	PACK=$(echo HEAD | git pack-objects --revs pack) &&
> +	git verify-pack -v pack-$PACK.pack |
> +	    grep -E "commit|tree|blob" |
> +		sed -n -e "s/^\([0-9a-f]*\).*/\1/p" >obj-list

Here, I would recommend avoiding the pipe, to ensure that we would catch
problems in the `verify-pack` invocation, and I think we can avoid the
`grep` altogether:

	git verify-pack -v pack-$PACK.pack >out &&
	sed -n 's/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\)/\1/p' \
		<out >obj-list

>  '
>
>  test_expect_success 'set memory limitation to 1MB' '
> @@ -45,6 +48,16 @@ test_expect_success 'unpack big object in stream' '
>  	test_dir_is_empty dest.git/objects/pack
>  '
>
> +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch'
> +
> +test_expect_success 'unpack big object in stream (core.fsyncmethod=batch)' '
> +	prepare_dest 1m &&
> +	git $BATCH_CONFIGURATION -C dest.git unpack-objects <pack-$PACK.pack &&

I think the canonical way would be to use `test_config core.fsync ...`,
but the presented way works, too.

> +	test_dir_is_empty dest.git/objects/pack &&
> +	git -C dest.git cat-file --batch-check="%(objectname)" <obj-list >current &&

Good. The `--batch-check="%(objectname)"` part forces `cat-file` to read
the actual object.

> +	cmp obj-list current
> +'

My main question about this test case is whether it _actually_ verifies
that the batch-mode `fsync()`ing took place.

I kind of had expected to see Trace2 enabled and a `grep` for
`fsync/hardware-flush`. Do you think that would still make sense to add?

Thank you for working on the `fsync()` aspects of Git!
Dscho

> +
>  test_expect_success 'do not unpack existing large objects' '
>  	prepare_dest 1m &&
>  	git -C dest.git index-pack --stdin <pack-$PACK.pack &&
> --
> 2.36.1
>
>




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux