On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote: > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > As corrected commit dates and topological levels cannot be compared > directly, we must handle commit graph chains with mixed generation > number definitions. > > While reading a commit graph file, we disable generation numbers if the > chain contains mixed generation numbers. > > While writing to commit graph chain, we write generation data chunk only > if the previous tip of chain had a generation data chunk. Using > `--split=replace` overwrites the existing chain and writes generation > data chunk regardless of previous tip. > > In t5324-split-commit-graph, we set up a repo with twelve commits and > write a base commit graph file with no generation data chunk. When add > three commits and write to chain again, Git does not write generation > data chunk even without setting GIT_TEST_COMMIT_GRAPH_NO_GDAT=1. Then, > as we replace the existing chain, Git writes a commit graph file with > generation data chunk. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > --- > commit-graph.c | 14 ++++++++ > t/t5324-split-commit-graph.sh | 66 +++++++++++++++++++++++++++++++++++ > 2 files changed, 80 insertions(+) > > diff --git a/commit-graph.c b/commit-graph.c > index d0f977852b..c6b6111adf 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -674,6 +674,14 @@ int generation_numbers_enabled(struct repository *r) > if (!g->num_commits) > return 0; > > + /* We cannot compare topological levels and corrected commit dates */ > + while (g->base_graph) { > + warning(_("commit-graph-chain contains mixed generation versions")); This warning is premature. It will add a warning whenever we have a split commit-graph, regardless of an incorrect chain. > + if ((g->chunk_generation_data == NULL) ^ (g->base_graph->chunk_generation_data == NULL)) Hm. A bit-wise XOR here? That seems unfortunate. I think that it is easier to focus on the > + return 0; > + g = g->base_graph; > + } > + Hm. So this scenario actually disables generation numbers completely in the event that anything in the chain disagrees. I think this is not the right way to approach the situation, as it will significantly punish users in this state with slow performance. The patch I sent [1] is probably better: it uses generation number v1 if the tip of the chain does not have a GDAT chunk. [1] https://lore.kernel.org/git/a3910f82-ab2e-bf35-ac43-c30d77f3c96b@xxxxxxxxx/ > first_generation = get_be32(g->chunk_commit_data + > g->hash_len + 8) >> 2; > > @@ -2186,6 +2194,9 @@ int write_commit_graph(struct object_directory *odb, > > g = ctx->r->objects->commit_graph; > > + if (g && !g->chunk_generation_data) > + ctx->write_generation_data = 0; > + > while (g) { > ctx->num_commit_graphs_before++; > g = g->base_graph; > @@ -2204,6 +2215,9 @@ int write_commit_graph(struct object_directory *odb, > > if (ctx->split_opts) > replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; > + > + if (replace) > + ctx->write_generation_data = 1; > } Please make a point to move the line that checks GIT_TEST_COMMIT_GRAPH_NO_GDAT from its current location to after this line. We want to make sure that the environment variable is checked _last_. The best location is likely the start of the implementation of compute_generation_numbers(), or immediately before the call to the method. > +test_expect_success 'setup repo for mixed generation commit-graph-chain' ' > + mkdir mixed && > + graphdir=".git/objects/info/commit-graphs" && > + cd "$TRASH_DIRECTORY/mixed" && > + git init && > + git config core.commitGraph true && > + git config gc.writeCommitGraph false && > + for i in $(test_seq 3) > + do > + test_commit $i && > + git branch commits/$i || return 1 > + done && > + git reset --hard commits/1 && > + for i in $(test_seq 4 5) > + do > + test_commit $i && > + git branch commits/$i || return 1 > + done && > + git reset --hard commits/2 && > + for i in $(test_seq 6 10) > + do > + test_commit $i && > + git branch commits/$i || return 1 > + done && > + git reset --hard commits/2 && > + git merge commits/4 && > + git branch merge/1 && > + git reset --hard commits/4 && > + git merge commits/6 && > + git branch merge/2 && > + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split && > + test-tool read-graph >output && > + cat >expect <<-EOF && > + header: 43475048 1 1 3 0 > + num_commits: 12 > + chunks: oid_fanout oid_lookup commit_metadata > + EOF > + test_cmp expect output > +' > + > +test_expect_success 'does not write generation data chunk if not present on existing tip' ' > + cd "$TRASH_DIRECTORY/mixed" && > + git reset --hard commits/3 && > + git merge merge/1 && > + git merge commits/5 && > + git merge merge/2 && > + git branch merge/3 && > + git commit-graph write --reachable --split && > + test-tool read-graph >output && > + cat >expect <<-EOF && > + header: 43475048 1 1 4 1 > + num_commits: 3 > + chunks: oid_fanout oid_lookup commit_metadata > + EOF > + test_cmp expect output > +' > + > +test_expect_success 'writes generation data chunk when commit-graph chain is replaced' ' > + cd "$TRASH_DIRECTORY/mixed" && > + git commit-graph write --reachable --split='replace' && > + test_path_is_file $graphdir/commit-graph-chain && > + test_line_count = 1 $graphdir/commit-graph-chain && > + verify_chain_files_exist $graphdir && > + graph_read_expect 15 > +' It would be valuable to double-check here that the values in the GDAT chunk are correct. I'm concerned about the possibility that the 'generation' member of struct commit_graph_data gets filled with topological level during parsing and then that is written as an offset into the CDAT chunk. Perhaps this is best left for a follow-up series that updates the 'verify' subcommand to check the GDAT chunk. Thanks, -Stolee