Re: [PATCH 4/5] commit-graph: be extra careful about mixed generations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 01, 2021 at 05:15:06PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> When upgrading to a commit-graph with corrected commit dates from
> one without, there are a few things that need to be considered.
>
> When computing generation numbers for the new commit-graph file that
> expects to add the generation_data chunk with corrected commit
> dates, we need to ensure that the 'generation' member of the
> commit_graph_data struct is set to zero for these commits.
>
> Unfortunately, the fallback to use topological level for generation
> number when corrected commit dates are not available are causing us
> harm here: parsing commits notices that read_generation_data is
> false and populates 'generation' with the topological level.
>
> The solution is to iterate through the commits, parse the commits
> to populate initial values, then reset the generation values to
> zero to trigger recalculation. This loop only occurs when the
> existing commit-graph data has no corrected commit dates.
>
> While this improves our situation somewhat, we have not completely
> solved the issue for correctly computing generation numbers for mixes
> layers. That follows in the next change.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  commit-graph.c | 32 +++++++++++++++++++++++---------
>  1 file changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 13992137dd0..08148dd17f1 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1033,7 +1033,8 @@ struct write_commit_graph_context {
>  		 split:1,
>  		 changed_paths:1,
>  		 order_by_pack:1,
> -		 write_generation_data:1;
> +		 write_generation_data:1,
> +		 trust_generation_numbers:1;
>
>  	struct topo_level_slab *topo_levels;
>  	const struct commit_graph_opts *opts;
> @@ -1452,6 +1453,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  		ctx->progress = start_delayed_progress(
>  					_("Computing commit graph generation numbers"),
>  					ctx->commits.nr);
> +
> +	if (ctx->write_generation_data && !ctx->trust_generation_numbers) {
> +		for (i = 0; i < ctx->commits.nr; i++) {
> +			struct commit *c = ctx->commits.list[i];
> +			repo_parse_commit(ctx->r, c);
> +			commit_graph_data_at(c)->generation = GENERATION_NUMBER_ZERO;
> +		}
> +	}
> +

This took me a while to figure out since I spent quite a lot of time
thinking that you were setting the topological level to zero, _not_ the
corrected committer date.

Now that I understand which is which, I agree that this is the right way
to go forward.

That said, I do find it unnecessarily complex that we compute both the
generation number and the topological level in the same loops in
compute_generation_numbers()...

>  	for (i = 0; i < ctx->commits.nr; i++) {
>  		struct commit *c = ctx->commits.list[i];
>  		uint32_t level;
> @@ -1480,7 +1490,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
>
>  				if (level == GENERATION_NUMBER_ZERO ||
> -				    corrected_commit_date == GENERATION_NUMBER_ZERO) {
> +				    (ctx->write_generation_data &&
> +				     corrected_commit_date == GENERATION_NUMBER_ZERO)) {

...for exactly reasons like this. It does make sense that they could be
computed together since their computation is indeed quite similar. But
in practice I think you end up spending a lot of time reasoning around
complex conditionals like these.

So, I feel a little bit like we should spend some effort to split these
up. I'm OK with a little bit of code duplication (though if we can
factor out some common routine, that would also be nice). But I think
there's a tradeoff between DRY-ness and understandability, and that we
might be on the wrong side of it here.

>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;
> @@ -1500,12 +1511,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					max_level = GENERATION_NUMBER_V1_MAX - 1;
>  				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
>
> -				if (current->date && current->date > max_corrected_commit_date)
> -					max_corrected_commit_date = current->date - 1;
> -				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> -
> -				if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> -					ctx->num_generation_data_overflows++;
> +				if (ctx->write_generation_data) {
> +					timestamp_t cur_g;
> +					if (current->date && current->date > max_corrected_commit_date)
> +						max_corrected_commit_date = current->date - 1;
> +					cur_g = commit_graph_data_at(current)->generation
> +					      = max_corrected_commit_date + 1;
> +					if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> +						ctx->num_generation_data_overflows++;
> +				}

Looks like two things happened here:

  - A new local variable was introduced to store the value of
    'commit_graph_data_at(current)->generation' (now called 'cur_g'),
    and

  - All of this was guarded by a conditional on
    'ctx->write_generation_data'.

The first one is a readability improvement, and the second is the
substantive one, no?

>  			}
>  		}
>  	}
> @@ -2396,7 +2410,7 @@ int write_commit_graph(struct object_directory *odb,
>  	} else
>  		ctx->num_commit_graphs_after = 1;
>
> -	validate_mixed_generation_chain(ctx->r->objects->commit_graph);
> +	ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph);
>
>  	compute_generation_numbers(ctx);

Makes sense.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux