On 7/28/2020 5:13 AM, Abhishek Kumar via GitGitGadget wrote: > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > With preparations done,... I feel like this commit could have been made smaller by doing the uint32_t -> timestamp_t conversion in a separate patch. That would make it easier to focus on the changes to the generation number v2 logic. > let's implement corrected commit date offset. > We add a new commit-slab to store topological levels while writing It's important to add: we store topological levels to ensure that older versions of Git will still have the performance benefits from generation number v1. > commit graph and upgrade number of struct commit_graph_data to 64-bits. Do you mean "update the generation member in struct commit_graph_data to a 64-bit timestamp"? The struct itself also has the 32-bit graph_pos member. > We have to touch many files, upgrading generation number from uint32_t > to timestamp_t. Yes, that's why I recommend doing that in a different step. > We drop 'detect incorrect generation number' from t5318-commit-graph.sh, > which tests if verify can detect if a commit graph have > GENERATION_NUMBER_ZERO for a commit, followed by a non-zero generation. > With corrected commit dates, GENERATION_NUMBER_ZERO is possible only if > one of dates is Unix epoch zero. What about the topological levels? Are we caring about verifying the data that we start to ignore in this new version? I'm hesitant to drop this right now, but I'm open to it if we really don't see it as a valuable test. > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > --- > blame.c | 2 +- > commit-graph.c | 109 ++++++++++++++++++++++------------------ > commit-graph.h | 4 +- > commit-reach.c | 32 ++++++------ > commit-reach.h | 2 +- > commit.h | 3 ++ > revision.c | 14 +++--- > t/t5318-commit-graph.sh | 2 +- > upload-pack.c | 2 +- > 9 files changed, 93 insertions(+), 77 deletions(-) > > diff --git a/blame.c b/blame.c > index 82fa16d658..48aa632461 100644 > --- a/blame.c > +++ b/blame.c > @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r, > if (!bd) > return 1; > > - if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) > + if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_V2_INFINITY) > return 1; I don't see value in changing the name of this macro. It is only used as the default value for a commit not in the commit-graph. Changing its value to 0xFFFFFFFF works for both versions when the type is updated to timestamp_t. The actually-important change in this patch (not just the type change) is here: > -static void compute_generation_numbers(struct write_commit_graph_context *ctx) > +static void compute_corrected_commit_date_offsets(struct write_commit_graph_context *ctx) > { > int i; > struct commit_list *list = NULL; > @@ -1326,11 +1334,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > _("Computing commit graph generation numbers"), > ctx->commits.nr); > for (i = 0; i < ctx->commits.nr; i++) { > - uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; > + uint32_t topo_level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]); > > display_progress(ctx->progress, i + 1); > - if (generation != GENERATION_NUMBER_INFINITY && > - generation != GENERATION_NUMBER_ZERO) > + if (topo_level != GENERATION_NUMBER_INFINITY && > + topo_level != GENERATION_NUMBER_ZERO) > continue; Here, our "skip" condition is that the topo_level has been computed. This should be fine, as we are never reading that out of the commit-graph. We will never be in a mode where topo_level is computed but corrected commit-date is not. > commit_list_insert(ctx->commits.list[i], &list); > @@ -1338,29 +1346,38 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > struct commit *current = list->item; > struct commit_list *parent; > int all_parents_computed = 1; > - uint32_t max_generation = 0; > + uint32_t max_level = 0; > + timestamp_t max_corrected_commit_date = current->date; Later you assign data->generation to be "max_corrected_commit_date + 1", which made me think this should be "current->date - 1". Is that so? Or, do we want most offsets to be one instead of zero? Is there value there? > > for (parent = current->parents; parent; parent = parent->next) { > - generation = commit_graph_data_at(parent->item)->generation; > + topo_level = *topo_level_slab_at(ctx->topo_levels, parent->item); > > - if (generation == GENERATION_NUMBER_INFINITY || > - generation == GENERATION_NUMBER_ZERO) { > + if (topo_level == GENERATION_NUMBER_INFINITY || > + topo_level == GENERATION_NUMBER_ZERO) { > all_parents_computed = 0; > commit_list_insert(parent->item, &list); > break; > - } else if (generation > max_generation) { > - max_generation = generation; > + } else { > + struct commit_graph_data *data = commit_graph_data_at(parent->item); > + > + if (topo_level > max_level) > + max_level = topo_level; > + > + if (data->generation > max_corrected_commit_date) > + max_corrected_commit_date = data->generation; > } > } > > if (all_parents_computed) { > struct commit_graph_data *data = commit_graph_data_at(current); > > - data->generation = max_generation + 1; > - pop_commit(&list); > + if (max_level > GENERATION_NUMBER_MAX - 1) > + max_level = GENERATION_NUMBER_MAX - 1; > + > + *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; > + data->generation = max_corrected_commit_date + 1; > > - if (data->generation > GENERATION_NUMBER_MAX) > - data->generation = GENERATION_NUMBER_MAX; > + pop_commit(&list); > } > } > } This looks correct, and I've done a tiny bit of perf tests locally. > @@ -2085,6 +2102,7 @@ int write_commit_graph(struct object_directory *odb, > uint32_t i, count_distinct = 0; > int res = 0; > int replace = 0; > + struct topo_level_slab topo_levels; > > if (!commit_graph_compatible(the_repository)) > return 0; > @@ -2099,6 +2117,9 @@ int write_commit_graph(struct object_directory *odb, > ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0; > ctx->total_bloom_filter_data_size = 0; > > + init_topo_level_slab(&topo_levels); > + ctx->topo_levels = &topo_levels; > + > if (ctx->split) { > struct commit_graph *g; > prepare_commit_graph(ctx->r); > @@ -2197,7 +2218,7 @@ int write_commit_graph(struct object_directory *odb, > } else > ctx->num_commit_graphs_after = 1; > > - compute_generation_numbers(ctx); > + compute_corrected_commit_date_offsets(ctx); This rename might not be necessary. You are computing both versions (v1 and v2) so the name change is actually less accurate than the old name. Thanks, -Stolee