Re: [PATCH v4 06/10] commit-graph: implement corrected commit date

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 27, 2020 at 07:53:23PM +0100, Jakub Narębski wrote:
> "Abhishek Kumar via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
> > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> > ...
> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> 
> Somewhere in the commit message we should also describe that this commit
> changes how commit-graph is verified: from checking that the generation
> number agrees with _topological level definition_, that is that for a
> given commit it is 1 more than maximum of its parents (with the caveat
> that we need to handle GENERATION_NUMBER_V1_MAX values correctly), to
> checking that slightly weaker condition fulfilled by both topological
> levels (generation number v1) and by corrected commit date (generation
> number v2) that for a given commit its generation number is 1 more than
> maximum of its parents or larger.

Sure, that makes sense. Will add.

> 
> But, as far as I understand it, current code does not handle correctly
> GENERATION_NUMBER_V1_MAX case (if we use generation number v1).
> 
> On the other hand we could have simpy use functional check, that
> generation number used (which can be v1 or v2, or any similar other)
> fulfills the reachability condition for each edge, which can be
> simplified to checking that generation(parents) <= generation(commit).
> If the reachability condition is true for each edge, then it is true for
> each path, and for each commit.
> 
> > ---
> >  commit-graph.c | 43 +++++++++++++++++++++++--------------------
> >  1 file changed, 23 insertions(+), 20 deletions(-)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index cedd311024..03948adfce 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -154,11 +154,6 @@ static int commit_gen_cmp(const void *va, const void *vb)
> >  	else if (generation_a > generation_b)
> >  		return 1;
> >  
> > -	/* use date as a heuristic when generations are equal */
> > -	if (a->date < b->date)
> > -		return -1;
> > -	else if (a->date > b->date)
> > -		return 1;
> 
> Why this change?  It is not described in the commit message.
> 
> Note that while this tie-breaking fallback doesn't make much sense for
> corrected committer date generation number v2, this tie-breaking helps
> if we have to use topological levels (generation number v2).
> 

Right, I should have mentioned this change (and it's not something that
makes a difference either way).

We call commit_gen_cmp() only when we are sorting commits by generation
to speed up computation of Bloom filters i.e. while writing a commit
graph (either split commit-graph or a simple commit-graph).

Since we are always computing and storing corrected commit date when we
are writing (whether we write a GDAT chunk or not), using date as
heuristic is longer required.

> >  	return 0;
> >  }
> >  
> > @@ -1357,10 +1352,14 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  					ctx->commits.nr);
> >  	for (i = 0; i < ctx->commits.nr; i++) {
> >  		timestamp_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
> 
> Sidenote: I haven't noticed it earlier, but here 'uint32_t' might be
> enough; no need for 'timestamp_t' for 'level' variable.
> 
> > +		timestamp_t corrected_commit_date = commit_graph_data_at(ctx->commits.list[i])->generation;
> >

We need the 'timestamp_t' as we are comparing level with the now 64-bits
GENERATION_NUMBER_INFINITY. I thought uint32_t would be promoted to
timestamp_t. I have a hunch that since we are explicitly using a fixed
width data type, compiler is unwilling to type coerce into broader data
types.

Advice on this appreciated.

> 
> All right, we compute both generation numbers: topological levels and
> corrected commit date.
> 
> I guess we use 'corrected_commit_date' instead of simply 'generation' to
> make it asier to remember which is which.
> 
> >  		display_progress(ctx->progress, i + 1);
> >  		if (level != GENERATION_NUMBER_INFINITY &&
> > -		    level != GENERATION_NUMBER_ZERO)
> > +		    level != GENERATION_NUMBER_ZERO &&
> > +		    corrected_commit_date != GENERATION_NUMBER_INFINITY &&
> > +		    corrected_commit_date != GENERATION_NUMBER_ZERO
> 
> Straightforward addition.
> 
> > +		    )
> 
> Why this closing parenthesis is now in separated line?
> 
> >  			continue;
> >  
> >  		commit_list_insert(ctx->commits.list[i], &list);
> > @@ -1369,17 +1368,25 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  			struct commit_list *parent;
> >  			int all_parents_computed = 1;
> >  			uint32_t max_level = 0;
> > +			timestamp_t max_corrected_commit_date = 0;
> 
> All right, straightforward addition.
> 
> >  
> >  			for (parent = current->parents; parent; parent = parent->next) {
> >  				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
> > -
> 
> Why we have removed this empty line?
> 
> > +				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
> 
> All right.
> 
> >  				if (level == GENERATION_NUMBER_INFINITY ||
> > -				    level == GENERATION_NUMBER_ZERO) {
> > +				    level == GENERATION_NUMBER_ZERO ||
> > +				    corrected_commit_date == GENERATION_NUMBER_INFINITY ||
> > +				    corrected_commit_date == GENERATION_NUMBER_ZERO
> > +				    ) {
> 
> All right, same as above.
> 
> >  					all_parents_computed = 0;
> >  					commit_list_insert(parent->item, &list);
> >  					break;
> > -				} else if (level > max_level) {
> > -					max_level = level;
> > +				} else {
> > +					if (level > max_level)
> > +						max_level = level;
> > +
> > +					if (corrected_commit_date > max_corrected_commit_date)
> > +						max_corrected_commit_date = corrected_commit_date;
> >  				}
> 
> All right, reasonable and straightforward.
> 
> >  			}
> >  
> > @@ -1389,6 +1396,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
> >  					max_level = GENERATION_NUMBER_V1_MAX - 1;
> >  				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> > +
> > +				if (current->date && current->date > max_corrected_commit_date)
> > +					max_corrected_commit_date = current->date - 1;
> > +				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> 
> All right.
> 
> Here we use the same trick as in previous commit (and as above) to avoid
> any possible overflow, to minimize number of conditionals.  The fact
> that max_corrected_commit_date might store incorrect value doesn't
> matter, as it is reset at beginning of this loop.
> 
> >  			}
> >  		}
> >  	}
> > @@ -2485,17 +2496,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> >  		if (generation_zero == GENERATION_ZERO_EXISTS)
> >  			continue;
> >  
> > -		/*
> > -		 * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then
> > -		 * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid
> > -		 * extra logic in the following condition.
> > -		 */
> > -		if (max_generation == GENERATION_NUMBER_V1_MAX)
> > -			max_generation--;
> > -
> 
> Perhaps in the future we should check that both topological levels, and
> also corrected committer date (if it exists) for correctness according
> to their definition.  Then the above removed part would be restored (but
> with s/max_generation/max_level/).
> 
> >  		generation = commit_graph_generation(graph_commit);
> > -		if (generation != max_generation + 1)
> > -			graph_report(_("commit-graph generation for commit %s is %u != %u"),
> > +		if (generation < max_generation + 1)
> > +			graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime),
> 
> All right, so we relaxed the check so that it will be fulfilled by
> generation number v2 (and also by generation number v1, as it implies
> the more strict check for v1).
> 
> What would happen however if generation holds topological levels, and it
> is GENERATION_NUMBER_V1_MAX for at least one parent, which means it is
> GENERATION_NUMBER_V1_MAX for a commit?  As you can check, the condition
> would be true: GENERATION_NUMBER_V1_MAX < GENERATION_NUMBER_V1_MAX + 1,
> so the `git commit-graph verify` would incorrectly say that there is
> a problem with generation number, while there isn't one (false positive
> detection of error).

Alright, so the above block still makes sense if we are working with
topological levels but not with corrected commit dates. Instead of
removing it, I will modify the condition to check that one of our parents
has GENERATION_NUMBER_V1_MAX and the graph uses topological levels.

Suprised that no test breaks by this change.

I have also moved changes in the verify function to the next patch, as
we cannot write or read corrected commit dates yet - so little sense in
modifying verify.

> 
> Sidenote: I think we don't have to worry about having to introduce
> GENERATION_NUMBER_V2_MAX, as the in-memory size (of reconstructed from
> disck representation) corrected commiter date is the same as of commiter
> date itself, plus some, and I don't see us coming close to 64-bit limit
> of timestamp_t for commit dates.
> 
> >  				     oid_to_hex(&cur_oid),
> >  				     generation,
> >  				     max_generation + 1);
> 
> Best,
> -- 
> Jakub Narębski

Thanks
- Abhishek



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux