Re: [PATCH 6/6] commit-graph: implement corrected commit date offset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 28, 2020 at 11:55:12AM -0400, Derrick Stolee wrote:
> On 7/28/2020 5:13 AM, Abhishek Kumar via GitGitGadget wrote:
> > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> >
> > With preparations done,...
>
> I feel like this commit could have been made smaller by doing the
> uint32_t -> timestamp_t conversion in a separate patch. That would
> make it easier to focus on the changes to the generation number v2
> logic.

Yep, agreed.

> > let's implement corrected commit date offset.
> > We add a new commit-slab to store topological levels while writing
>
> It's important to add: we store topological levels to ensure that older
> versions of Git will still have the performance benefits from generation
> number v1.
>
> > commit graph and upgrade number of struct commit_graph_data to 64-bits.
>
> Do you mean "update the generation member in struct commit_graph_data
> to a 64-bit timestamp"? The struct itself also has the 32-bit graph_pos
> member.
>
> > We have to touch many files, upgrading generation number from uint32_t
> > to timestamp_t.
>
> Yes, that's why I recommend doing that in a different step.
>
> > We drop 'detect incorrect generation number' from t5318-commit-graph.sh,
> > which tests if verify can detect if a commit graph have
> > GENERATION_NUMBER_ZERO for a commit, followed by a non-zero generation.
> > With corrected commit dates, GENERATION_NUMBER_ZERO is possible only if
> > one of dates is Unix epoch zero.
>
> What about the topological levels? Are we caring about verifying the data
> that we start to ignore in this new version? I'm hesitant to drop this
> right now, but I'm open to it if we really don't see it as a valuable test.
>
> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> > ---
> >  blame.c                 |   2 +-
> >  commit-graph.c          | 109 ++++++++++++++++++++++------------------
> >  commit-graph.h          |   4 +-
> >  commit-reach.c          |  32 ++++++------
> >  commit-reach.h          |   2 +-
> >  commit.h                |   3 ++
> >  revision.c              |  14 +++---
> >  t/t5318-commit-graph.sh |   2 +-
> >  upload-pack.c           |   2 +-
> >  9 files changed, 93 insertions(+), 77 deletions(-)
> >
> > diff --git a/blame.c b/blame.c
> > index 82fa16d658..48aa632461 100644
> > --- a/blame.c
> > +++ b/blame.c
> > @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
> >  	if (!bd)
> >  		return 1;
> >
> > -	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY)
> > +	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_V2_INFINITY)
> >  		return 1;
>
> I don't see value in changing the name of this macro. It
> is only used as the default value for a commit not in the
> commit-graph. Changing its value to 0xFFFFFFFF works for
> both versions when the type is updated to timestamp_t.
>
> The actually-important change in this patch (not just the
> type change) is here:
>
> > -static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> > +static void compute_corrected_commit_date_offsets(struct write_commit_graph_context *ctx)
> >  {
> >  	int i;
> >  	struct commit_list *list = NULL;
> > @@ -1326,11 +1334,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  					_("Computing commit graph generation numbers"),
> >  					ctx->commits.nr);
> >  	for (i = 0; i < ctx->commits.nr; i++) {
> > -		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
> > +		uint32_t topo_level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
> >
> >  		display_progress(ctx->progress, i + 1);
> > -		if (generation != GENERATION_NUMBER_INFINITY &&
> > -		    generation != GENERATION_NUMBER_ZERO)
> > +		if (topo_level != GENERATION_NUMBER_INFINITY &&
> > +		    topo_level != GENERATION_NUMBER_ZERO)
> >  			continue;
>
> Here, our "skip" condition is that the topo_level has been computed.
> This should be fine, as we are never reading that out of the commit-graph.
> We will never be in a mode where topo_level is computed but corrected
> commit-date is not.
>
> >  		commit_list_insert(ctx->commits.list[i], &list);
> > @@ -1338,29 +1346,38 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  			struct commit *current = list->item;
> >  			struct commit_list *parent;
> >  			int all_parents_computed = 1;
> > -			uint32_t max_generation = 0;
> > +			uint32_t max_level = 0;
> > +			timestamp_t max_corrected_commit_date = current->date;
>
> Later you assign data->generation to be "max_corrected_commit_date + 1",
> which made me think this should be "current->date - 1". Is that so? Or,
> do we want most offsets to be one instead of zero? Is there value there?
>
> >
> >  			for (parent = current->parents; parent; parent = parent->next) {
> > -				generation = commit_graph_data_at(parent->item)->generation;
> > +				topo_level = *topo_level_slab_at(ctx->topo_levels, parent->item);
> >
> > -				if (generation == GENERATION_NUMBER_INFINITY ||
> > -				    generation == GENERATION_NUMBER_ZERO) {
> > +				if (topo_level == GENERATION_NUMBER_INFINITY ||
> > +				    topo_level == GENERATION_NUMBER_ZERO) {
> >  					all_parents_computed = 0;
> >  					commit_list_insert(parent->item, &list);
> >  					break;
> > -				} else if (generation > max_generation) {
> > -					max_generation = generation;
> > +				} else {
> > +					struct commit_graph_data *data = commit_graph_data_at(parent->item);
> > +
> > +					if (topo_level > max_level)
> > +						max_level = topo_level;
> > +
> > +					if (data->generation > max_corrected_commit_date)
> > +						max_corrected_commit_date = data->generation;
> >  				}
> >  			}
> >
> >  			if (all_parents_computed) {
> >  				struct commit_graph_data *data = commit_graph_data_at(current);
> >
> > -				data->generation = max_generation + 1;
> > -				pop_commit(&list);
> > +				if (max_level > GENERATION_NUMBER_MAX - 1)
> > +					max_level = GENERATION_NUMBER_MAX - 1;
> > +
> > +				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> > +				data->generation = max_corrected_commit_date + 1;
> >
> > -				if (data->generation > GENERATION_NUMBER_MAX)
> > -					data->generation = GENERATION_NUMBER_MAX;
> > +				pop_commit(&list);
> >  			}
> >  		}
> >  	}
>
> This looks correct, and I've done a tiny bit of perf tests locally.
>
> > @@ -2085,6 +2102,7 @@ int write_commit_graph(struct object_directory *odb,
> >  	uint32_t i, count_distinct = 0;
> >  	int res = 0;
> >  	int replace = 0;
> > +	struct topo_level_slab topo_levels;
> >
> >  	if (!commit_graph_compatible(the_repository))
> >  		return 0;
> > @@ -2099,6 +2117,9 @@ int write_commit_graph(struct object_directory *odb,
> >  	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
> >  	ctx->total_bloom_filter_data_size = 0;
> >
> > +	init_topo_level_slab(&topo_levels);
> > +	ctx->topo_levels = &topo_levels;
> > +
> >  	if (ctx->split) {
> >  		struct commit_graph *g;
> >  		prepare_commit_graph(ctx->r);
> > @@ -2197,7 +2218,7 @@ int write_commit_graph(struct object_directory *odb,
> >  	} else
> >  		ctx->num_commit_graphs_after = 1;
> >
> > -	compute_generation_numbers(ctx);
> > +	compute_corrected_commit_date_offsets(ctx);
>
> This rename might not be necessary. You are computing both
> versions (v1 and v2) so the name change is actually less
> accurate than the old name.
>
> Thanks,
> -Stolee

I don't have anything to add that Stolee hasn't already pointed out.
Thanks for your work on this series, and I'm looking forward to another
reroll.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux