Re: [PATCH v3 14/14] builtin/commit-graph.c: introduce '--max-new-filters=<n>'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 19, 2020 at 12:23:29AM +0200, SZEDER Gábor wrote:
> On Tue, Aug 11, 2020 at 04:52:14PM -0400, Taylor Blau wrote:
> > Introduce a command-line flag and configuration variable to fill in the
> > 'max_new_filters' variable introduced by the previous patch.
>
> 'max_new_filters' is introduced in this patch.
>
> > The command-line option '--max-new-filters' takes precedence over
> > 'commitGraph.maxNewFilters', which is the default value.
>
> What "filters"?  While misnamed, the '--changed-paths' options did a
> good job at hiding the implementation detail that we use Bloom filters
> to speed up pathspec-limited revision walks; as far as I remember this
> was a conscious design decision.  Including "filter" in the name of
> the option and corresponding config variable goes against this
> decision.  Furthermore, by not being specific we might end up in abad
> situation when adding some other filters to the commit-graph format.
> Unfortunately, I can't offhand propose a better option name, all my
> ideas were horrible.

I don't disagree, but I can't come up with a better name either.
maxNewChangedPaths? maxNewSomething? I don't know. Usually I think
exposing this sort of detail is smelly, but I'm not so bothered by it
here. Similarly, this thread has been around for a while, and nobody
else has suggested a better name, so, I'm inclined to keep it.

> > +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
> > +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
> > +enforced. Overrides the `commitGraph.maxNewFilters` configuration.
>
> I think this description should also detail what happens with those
> commits for which no modified path Bloom filters are calculated, and
> which commands will calculate them (even implicitly).

OK, sure.

> >  static int get_bloom_filter_large_in_graph(struct commit_graph *g,
> > -					   const struct commit *c)
> > +					   const struct commit *c,
> > +					   uint32_t max_changed_paths)
> >  {
> >  	uint32_t graph_pos = commit_graph_position(c);
> >  	if (graph_pos == COMMIT_NOT_FROM_GRAPH)
> > @@ -965,6 +966,17 @@ static int get_bloom_filter_large_in_graph(struct commit_graph *g,
> >
> >  	if (!(g && g->bloom_large))
> >  		return 0;
> > +	if (g->bloom_filter_settings->max_changed_paths != max_changed_paths) {
> > +		/*
> > +		 * Force all commits which are subject to a different
> > +		 * 'max_changed_paths' limit to be recomputed from scratch.
> > +		 *
> > +		 * Note that this could likely be improved, but is ignored since
> > +		 * all real-world graphs set the maximum number of changed paths
> > +		 * at 512.
>
> I don't understand what the second part of this comment is trying to
> say; and real-world graphs might very well contain Bloom filters with
> more than 512 modified paths, because the applying that limit was
> buggy.

I'm trying to say that there is room for us to improve when we do and
don't recompute filters when the limit on the number of changed paths
deviates between layers, but that such deviations don't currently exist
in the wild.

Bugs are another thing, but we can't tell that they exist without
recomputing every filter.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux