Re: [PATCH 6/6] commit-graph: implement corrected commit date offset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/28/2020 5:13 AM, Abhishek Kumar via GitGitGadget wrote:
> From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> 
> With preparations done,...

I feel like this commit could have been made smaller by doing the
uint32_t -> timestamp_t conversion in a separate patch. That would
make it easier to focus on the changes to the generation number v2
logic.

> let's implement corrected commit date offset.
> We add a new commit-slab to store topological levels while writing

It's important to add: we store topological levels to ensure that older
versions of Git will still have the performance benefits from generation
number v1.

> commit graph and upgrade number of struct commit_graph_data to 64-bits.

Do you mean "update the generation member in struct commit_graph_data
to a 64-bit timestamp"? The struct itself also has the 32-bit graph_pos
member.

> We have to touch many files, upgrading generation number from uint32_t
> to timestamp_t.

Yes, that's why I recommend doing that in a different step.

> We drop 'detect incorrect generation number' from t5318-commit-graph.sh,
> which tests if verify can detect if a commit graph have
> GENERATION_NUMBER_ZERO for a commit, followed by a non-zero generation.
> With corrected commit dates, GENERATION_NUMBER_ZERO is possible only if
> one of dates is Unix epoch zero.

What about the topological levels? Are we caring about verifying the data
that we start to ignore in this new version? I'm hesitant to drop this
right now, but I'm open to it if we really don't see it as a valuable test.

> Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> ---
>  blame.c                 |   2 +-
>  commit-graph.c          | 109 ++++++++++++++++++++++------------------
>  commit-graph.h          |   4 +-
>  commit-reach.c          |  32 ++++++------
>  commit-reach.h          |   2 +-
>  commit.h                |   3 ++
>  revision.c              |  14 +++---
>  t/t5318-commit-graph.sh |   2 +-
>  upload-pack.c           |   2 +-
>  9 files changed, 93 insertions(+), 77 deletions(-)
> 
> diff --git a/blame.c b/blame.c
> index 82fa16d658..48aa632461 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
>  	if (!bd)
>  		return 1;
>  
> -	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY)
> +	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_V2_INFINITY)
>  		return 1;

I don't see value in changing the name of this macro. It
is only used as the default value for a commit not in the
commit-graph. Changing its value to 0xFFFFFFFF works for
both versions when the type is updated to timestamp_t.

The actually-important change in this patch (not just the
type change) is here:

> -static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> +static void compute_corrected_commit_date_offsets(struct write_commit_graph_context *ctx)
>  {
>  	int i;
>  	struct commit_list *list = NULL;
> @@ -1326,11 +1334,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					_("Computing commit graph generation numbers"),
>  					ctx->commits.nr);
>  	for (i = 0; i < ctx->commits.nr; i++) {
> -		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
> +		uint32_t topo_level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
>  
>  		display_progress(ctx->progress, i + 1);
> -		if (generation != GENERATION_NUMBER_INFINITY &&
> -		    generation != GENERATION_NUMBER_ZERO)
> +		if (topo_level != GENERATION_NUMBER_INFINITY &&
> +		    topo_level != GENERATION_NUMBER_ZERO)
>  			continue;

Here, our "skip" condition is that the topo_level has been computed.
This should be fine, as we are never reading that out of the commit-graph.
We will never be in a mode where topo_level is computed but corrected
commit-date is not.

>  		commit_list_insert(ctx->commits.list[i], &list);
> @@ -1338,29 +1346,38 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  			struct commit *current = list->item;
>  			struct commit_list *parent;
>  			int all_parents_computed = 1;
> -			uint32_t max_generation = 0;
> +			uint32_t max_level = 0;
> +			timestamp_t max_corrected_commit_date = current->date;

Later you assign data->generation to be "max_corrected_commit_date + 1",
which made me think this should be "current->date - 1". Is that so? Or,
do we want most offsets to be one instead of zero? Is there value there?

>  
>  			for (parent = current->parents; parent; parent = parent->next) {
> -				generation = commit_graph_data_at(parent->item)->generation;
> +				topo_level = *topo_level_slab_at(ctx->topo_levels, parent->item);
>  
> -				if (generation == GENERATION_NUMBER_INFINITY ||
> -				    generation == GENERATION_NUMBER_ZERO) {
> +				if (topo_level == GENERATION_NUMBER_INFINITY ||
> +				    topo_level == GENERATION_NUMBER_ZERO) {
>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;
> -				} else if (generation > max_generation) {
> -					max_generation = generation;
> +				} else {
> +					struct commit_graph_data *data = commit_graph_data_at(parent->item);
> +
> +					if (topo_level > max_level)
> +						max_level = topo_level;
> +
> +					if (data->generation > max_corrected_commit_date)
> +						max_corrected_commit_date = data->generation;
>  				}
>  			}
>  
>  			if (all_parents_computed) {
>  				struct commit_graph_data *data = commit_graph_data_at(current);
>  
> -				data->generation = max_generation + 1;
> -				pop_commit(&list);
> +				if (max_level > GENERATION_NUMBER_MAX - 1)
> +					max_level = GENERATION_NUMBER_MAX - 1;
> +
> +				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> +				data->generation = max_corrected_commit_date + 1;
>  
> -				if (data->generation > GENERATION_NUMBER_MAX)
> -					data->generation = GENERATION_NUMBER_MAX;
> +				pop_commit(&list);
>  			}
>  		}
>  	}

This looks correct, and I've done a tiny bit of perf tests locally.

> @@ -2085,6 +2102,7 @@ int write_commit_graph(struct object_directory *odb,
>  	uint32_t i, count_distinct = 0;
>  	int res = 0;
>  	int replace = 0;
> +	struct topo_level_slab topo_levels;
>  
>  	if (!commit_graph_compatible(the_repository))
>  		return 0;
> @@ -2099,6 +2117,9 @@ int write_commit_graph(struct object_directory *odb,
>  	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
>  	ctx->total_bloom_filter_data_size = 0;
>  
> +	init_topo_level_slab(&topo_levels);
> +	ctx->topo_levels = &topo_levels;
> +
>  	if (ctx->split) {
>  		struct commit_graph *g;
>  		prepare_commit_graph(ctx->r);
> @@ -2197,7 +2218,7 @@ int write_commit_graph(struct object_directory *odb,
>  	} else
>  		ctx->num_commit_graphs_after = 1;
>  
> -	compute_generation_numbers(ctx);
> +	compute_corrected_commit_date_offsets(ctx);

This rename might not be necessary. You are computing both
versions (v1 and v2) so the name change is actually less
accurate than the old name.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux