On Mon, Feb 01, 2021 at 05:15:06PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > When upgrading to a commit-graph with corrected commit dates from > one without, there are a few things that need to be considered. > > When computing generation numbers for the new commit-graph file that > expects to add the generation_data chunk with corrected commit > dates, we need to ensure that the 'generation' member of the > commit_graph_data struct is set to zero for these commits. > > Unfortunately, the fallback to use topological level for generation > number when corrected commit dates are not available are causing us > harm here: parsing commits notices that read_generation_data is > false and populates 'generation' with the topological level. > > The solution is to iterate through the commits, parse the commits > to populate initial values, then reset the generation values to > zero to trigger recalculation. This loop only occurs when the > existing commit-graph data has no corrected commit dates. > > While this improves our situation somewhat, we have not completely > solved the issue for correctly computing generation numbers for mixes > layers. That follows in the next change. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > commit-graph.c | 32 +++++++++++++++++++++++--------- > 1 file changed, 23 insertions(+), 9 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 13992137dd0..08148dd17f1 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1033,7 +1033,8 @@ struct write_commit_graph_context { > split:1, > changed_paths:1, > order_by_pack:1, > - write_generation_data:1; > + write_generation_data:1, > + trust_generation_numbers:1; > > struct topo_level_slab *topo_levels; > const struct commit_graph_opts *opts; > @@ -1452,6 +1453,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > ctx->progress = start_delayed_progress( > _("Computing commit graph generation numbers"), > ctx->commits.nr); > + > + if (ctx->write_generation_data && !ctx->trust_generation_numbers) { > + for (i = 0; i < ctx->commits.nr; i++) { > + struct commit *c = ctx->commits.list[i]; > + repo_parse_commit(ctx->r, c); > + commit_graph_data_at(c)->generation = GENERATION_NUMBER_ZERO; > + } > + } > + This took me a while to figure out since I spent quite a lot of time thinking that you were setting the topological level to zero, _not_ the corrected committer date. Now that I understand which is which, I agree that this is the right way to go forward. That said, I do find it unnecessarily complex that we compute both the generation number and the topological level in the same loops in compute_generation_numbers()... > for (i = 0; i < ctx->commits.nr; i++) { > struct commit *c = ctx->commits.list[i]; > uint32_t level; > @@ -1480,7 +1490,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > corrected_commit_date = commit_graph_data_at(parent->item)->generation; > > if (level == GENERATION_NUMBER_ZERO || > - corrected_commit_date == GENERATION_NUMBER_ZERO) { > + (ctx->write_generation_data && > + corrected_commit_date == GENERATION_NUMBER_ZERO)) { ...for exactly reasons like this. It does make sense that they could be computed together since their computation is indeed quite similar. But in practice I think you end up spending a lot of time reasoning around complex conditionals like these. So, I feel a little bit like we should spend some effort to split these up. I'm OK with a little bit of code duplication (though if we can factor out some common routine, that would also be nice). But I think there's a tradeoff between DRY-ness and understandability, and that we might be on the wrong side of it here. > all_parents_computed = 0; > commit_list_insert(parent->item, &list); > break; > @@ -1500,12 +1511,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) > max_level = GENERATION_NUMBER_V1_MAX - 1; > *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; > > - if (current->date && current->date > max_corrected_commit_date) > - max_corrected_commit_date = current->date - 1; > - commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; > - > - if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) > - ctx->num_generation_data_overflows++; > + if (ctx->write_generation_data) { > + timestamp_t cur_g; > + if (current->date && current->date > max_corrected_commit_date) > + max_corrected_commit_date = current->date - 1; > + cur_g = commit_graph_data_at(current)->generation > + = max_corrected_commit_date + 1; > + if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) > + ctx->num_generation_data_overflows++; > + } Looks like two things happened here: - A new local variable was introduced to store the value of 'commit_graph_data_at(current)->generation' (now called 'cur_g'), and - All of this was guarded by a conditional on 'ctx->write_generation_data'. The first one is a readability improvement, and the second is the substantive one, no? > } > } > } > @@ -2396,7 +2410,7 @@ int write_commit_graph(struct object_directory *odb, > } else > ctx->num_commit_graphs_after = 1; > > - validate_mixed_generation_chain(ctx->r->objects->commit_graph); > + ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph); > > compute_generation_numbers(ctx); Makes sense. Thanks, Taylor