Re: [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 03, 2020 at 06:59:03PM +0100, Jakub Narębski wrote:
> "Abhishek Kumar via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
> > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> >
> > With corrected commit dates implemented, we no longer have to rely on
> > commit date as a heuristic in paint_down_to_common().
> >
> > While using corrected commit dates Git walks nearly the same number of
> > commits as commit date, the process is slower as for each comparision we
> > have to access a commit-slab (for corrected committer date) instead of
> > accessing struct member (for committer date).
> 
> Something for the future: I wonder if it would be worth it to bring back
> generation number from the commit-slab into `struct commit`.
> 
> >
> > For example, the command `git merge-base v4.8 v4.9` on the linux
> > repository walks 167468 commits, taking 0.135s for committer date and
> > 167496 commits, taking 0.157s for corrected committer date respectively.
> 
> I think it would be good idea to explicitly refer to the commit that
> changed paint_down_to_common() to *not* use generation numbers v1
> (topological levels) in the cases such as this, namely 091f4cf3 (commit:
> don't use generation numbers if not needed).  In this commit we have the
> following:
> ...
>

I have re-arranged the first half of commit message: 

  091f4cf3 (commit: don't use generation numbers if not needed,
  2018-08-30) changed paint_down_to_common() to use commit dates instead
  of generation numbers v1 (topological levels) as the performance
  regressed on certain topologies. With generation number v2 (corrected
  commit dates) implemented, we no longer have to rely on commit dates and
  can use generation numbers.
  
  For example, the command `git merge-base v4.8 v4.9` on the Linux
  repository walks 167468 commits, taking 0.135s for committer date and
  167496 commits, taking 0.157s for corrected committer date respectively.
  
  While using corrected commit dates Git walks nearly the same number of
  commits as commit date, the process is slower as for each comparision we
  have to access a commit-slab (for corrected committer date) instead of
  accessing struct member (for committer date).

> 
> The times you report (0.135s and 0.157s) are close to 0.122s / 0.127s
> reported in 091f4cf3 - that is most probably because of the differences
> in the system performance (hardware, operating system, load, etc.).
> Numbers of commits walked for the committed date heuristics, that is
> 167,468 agrees with your results; 167,496 (+28) for corrected commit
> date (generation number v2) is significantly smaller (-468,083) than
> 635,579 reported for topological levels (generation number v1).
> 
> I suspect that there are cases (with date skew) where corrected commit
> date gives better performance than committer date heuristics, and I am
> quite sure that generation number v2 can give better performance in case
> where paint_down_to_common() uses generation numbers.
> 
> .................................................................
> 
> Here begins separate second change, which is not put into separate
> commit because it is fairly tightly connected to the change described
> above.  It would be good idea, in my opinion, to add a sentence that
> explicitely marks this switch, for example:
> 
>   This change accidentally broke fragile t6404-recursive-merge test.
>   t6404-recursive-merge setups a unique repository...
> 
> Maybe with s/accidentaly/incidentally/.
> 

Thanks, will add.

> Or add some other way of connection those two parts of the commit
> messages.
> ...
> >  
> > +int corrected_commit_dates_enabled(struct repository *r)
> > +{
> > +	struct commit_graph *g;
> > +	if (!prepare_commit_graph(r))
> > +		return 0;
> > +
> > +	g = r->objects->commit_graph;
> > +
> > +	if (!g->num_commits)
> > +		return 0;
> > +
> > +	return g->read_generation_data;
> > +}
> 
> Very nice abstraction.
> 
> Minor issue: I wonder if it would be better to use _available() or
> "_present()" rather than _enabled() suffix.
> 

We could, but that breaks conformity with `generation_numbers_enabled()`.

I see both functions to be similar in nature, to answer whether the
commit-graph has X? X could be topological levels or corrected commit
dates.

> > +
> >  struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
> >  {
> >  	struct commit_graph *g = r->objects->commit_graph;
> > diff --git a/commit-graph.h b/commit-graph.h
> > index ad52130883..d2c048dc64 100644
> > --- a/commit-graph.h
> > +++ b/commit-graph.h
> > @@ -89,13 +89,19 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
> >  struct commit_graph *parse_commit_graph(struct repository *r,
> >  					void *graph_map, size_t graph_size);
> >  
> > +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r);
> > +
> >  /*
> >   * Return 1 if and only if the repository has a commit-graph
> >   * file and generation numbers are computed in that file.
> >   */
> >  int generation_numbers_enabled(struct repository *r);
> >  
> > -struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r);
> 
> This moving get_bloom_filter_settings() before generation_numbers_enabled() 
> looks like accidental change.  If not, why it is here?

Right, that's an accidental change. I wanted to group
generation_numbers_enabled() and corrected_commit_dates_enabled()
together.

> 
> ...
> 
> Best,
> -- 
> Jakub Narębski

Thanks
- Abhishek



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux