Re: [PATCH v2 08/10] commit-graph: handle mixed generation commit chains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote:
> From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> 
> As corrected commit dates and topological levels cannot be compared
> directly, we must handle commit graph chains with mixed generation
> number definitions.
> 
> While reading a commit graph file, we disable generation numbers if the
> chain contains mixed generation numbers.
> 
> While writing to commit graph chain, we write generation data chunk only
> if the previous tip of chain had a generation data chunk. Using
> `--split=replace` overwrites the existing chain and writes generation
> data chunk regardless of previous tip.
> 
> In t5324-split-commit-graph, we set up a repo with twelve commits and
> write a base commit graph file with no generation data chunk. When add
> three commits and write to chain again, Git does not write generation
> data chunk even without setting GIT_TEST_COMMIT_GRAPH_NO_GDAT=1. Then,
> as we replace the existing chain, Git writes a commit graph file with
> generation data chunk.
> 
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> ---
>  commit-graph.c                | 14 ++++++++
>  t/t5324-split-commit-graph.sh | 66 +++++++++++++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index d0f977852b..c6b6111adf 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -674,6 +674,14 @@ int generation_numbers_enabled(struct repository *r)
>  	if (!g->num_commits)
>  		return 0;
>  
> +	/* We cannot compare topological levels and corrected commit dates */
> +	while (g->base_graph) {
> +		warning(_("commit-graph-chain contains mixed generation versions"));

This warning is premature. It will add a warning whenever we have
a split commit-graph, regardless of an incorrect chain.

> +		if ((g->chunk_generation_data == NULL) ^ (g->base_graph->chunk_generation_data == NULL))

Hm. A bit-wise XOR here? That seems unfortunate. I think that it
is easier to focus on the 



> +			return 0;
> +		g = g->base_graph;
> +	}
> +

Hm. So this scenario actually disables generation numbers completely
in the event that anything in the chain disagrees. I think this is
not the right way to approach the situation, as it will significantly
punish users in this state with slow performance.

The patch I sent [1] is probably better: it uses generation number
v1 if the tip of the chain does not have a GDAT chunk.

[1] https://lore.kernel.org/git/a3910f82-ab2e-bf35-ac43-c30d77f3c96b@xxxxxxxxx/

>  	first_generation = get_be32(g->chunk_commit_data +
>  				    g->hash_len + 8) >> 2;
>  
> @@ -2186,6 +2194,9 @@ int write_commit_graph(struct object_directory *odb,
>  
>  		g = ctx->r->objects->commit_graph;
>  
> +		if (g && !g->chunk_generation_data)
> +			ctx->write_generation_data = 0;
> +
>  		while (g) {
>  			ctx->num_commit_graphs_before++;
>  			g = g->base_graph;
> @@ -2204,6 +2215,9 @@ int write_commit_graph(struct object_directory *odb,
>  
>  		if (ctx->split_opts)
>  			replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE;
> +
> +		if (replace)
> +			ctx->write_generation_data = 1;
>  	}

Please make a point to move the line that checks GIT_TEST_COMMIT_GRAPH_NO_GDAT
from its current location to after this line. We want to make sure that the
environment variable is checked _last_. The best location is likely the start
of the implementation of compute_generation_numbers(), or immediately before
the call to the method.

> +test_expect_success 'setup repo for mixed generation commit-graph-chain' '
> +	mkdir mixed &&
> +	graphdir=".git/objects/info/commit-graphs" &&
> +	cd "$TRASH_DIRECTORY/mixed" &&
> +	git init &&
> +	git config core.commitGraph true &&
> +	git config gc.writeCommitGraph false &&
> +	for i in $(test_seq 3)
> +	do
> +		test_commit $i &&
> +		git branch commits/$i || return 1
> +	done &&
> +	git reset --hard commits/1 &&
> +	for i in $(test_seq 4 5)
> +	do
> +		test_commit $i &&
> +		git branch commits/$i || return 1
> +	done &&
> +	git reset --hard commits/2 &&
> +	for i in $(test_seq 6 10)
> +	do
> +		test_commit $i &&
> +		git branch commits/$i || return 1
> +	done &&
> +	git reset --hard commits/2 &&
> +	git merge commits/4 &&
> +	git branch merge/1 &&
> +	git reset --hard commits/4 &&
> +	git merge commits/6 &&
> +	git branch merge/2 &&
> +	GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split &&
> +	test-tool read-graph >output &&
> +	cat >expect <<-EOF &&
> +	header: 43475048 1 1 3 0
> +	num_commits: 12
> +	chunks: oid_fanout oid_lookup commit_metadata
> +	EOF
> +	test_cmp expect output
> +'
> +
> +test_expect_success 'does not write generation data chunk if not present on existing tip' '
> +	cd "$TRASH_DIRECTORY/mixed" &&
> +	git reset --hard commits/3 &&
> +	git merge merge/1 &&
> +	git merge commits/5 &&
> +	git merge merge/2 &&
> +	git branch merge/3 &&
> +	git commit-graph write --reachable --split &&
> +	test-tool read-graph >output &&
> +	cat >expect <<-EOF &&
> +	header: 43475048 1 1 4 1
> +	num_commits: 3
> +	chunks: oid_fanout oid_lookup commit_metadata
> +	EOF
> +	test_cmp expect output
> +'
> +
> +test_expect_success 'writes generation data chunk when commit-graph chain is replaced' '
> +	cd "$TRASH_DIRECTORY/mixed" &&
> +	git commit-graph write --reachable --split='replace' &&
> +	test_path_is_file $graphdir/commit-graph-chain &&
> +	test_line_count = 1 $graphdir/commit-graph-chain &&
> +	verify_chain_files_exist $graphdir &&
> +	graph_read_expect 15
> +'

It would be valuable to double-check here that the values in the GDAT chunk
are correct. I'm concerned about the possibility that the 'generation'
member of struct commit_graph_data gets filled with topological level during
parsing and then that is written as an offset into the CDAT chunk.

Perhaps this is best left for a follow-up series that updates the 'verify'
subcommand to check the GDAT chunk.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux