Re: [PATCH v3 01/14] commit-graph: introduce 'get_bloom_filter_settings()'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 12, 2020 at 07:48:55AM -0400, Derrick Stolee wrote:
> On 8/11/2020 7:55 PM, SZEDER Gábor wrote:
> > On Tue, Aug 11, 2020 at 05:34:11PM -0400, Taylor Blau wrote:
> >> On Tue, Aug 11, 2020 at 11:27:16PM +0200, SZEDER Gábor wrote:
> >>> On Tue, Aug 11, 2020 at 05:21:18PM -0400, Taylor Blau wrote:
> >>>> On Tue, Aug 11, 2020 at 11:18:30PM +0200, SZEDER Gábor wrote:
> >>>>> On Tue, Aug 11, 2020 at 04:51:19PM -0400, Taylor Blau wrote:
> >>>>>> Many places in the code often need a pointer to the commit-graph's
> >>>>>> 'struct bloom_filter_settings', in which case they often take the value
> >>>>>> from the top-most commit-graph.
> >>>>>>
> >>>>>> In the non-split case, this works as expected. In the split case,
> >>>>>> however, things get a little tricky. Not all layers in a chain of
> >>>>>> incremental commit-graphs are required to themselves have Bloom data,
> >>>>>> and so whether or not some part of the code uses Bloom filters depends
> >>>>>> entirely on whether or not the top-most level of the commit-graph chain
> >>>>>> has Bloom filters.
> >>>>>>
> >>>>>> This has been the behavior since Bloom filters were introduced, and has
> >>>>>> been codified into the tests since a759bfa9ee (t4216: add end to end
> >>>>>> tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130
> >>>>>> requires that Bloom filters are not used in exactly the case described
> >>>>>> earlier.
> >>>>>>
> >>>>>> There is no reason that this needs to be the case, since it is perfectly
> >>>>>> valid for commits in an earlier layer to have Bloom filters when commits
> >>>>>> in a newer layer do not.
> >>>>>>
> >>>>>> Since Bloom settings are guaranteed to be the same for any layer in a
> >>>>>> chain that has Bloom data,
> >>>>>
> >>>>> Is it?  Where is that guaranteed?
>
> Perhaps instead of "guaranteed" we could say "Git never writes
> a commit-graph chain with different settings between layers."
>
> >>>> There is no mechanism whatsoever to customize these settings that is
> >>>> exposed to the user (except for the undocumented 'GIT_TEST' environment
> >>>> variables).
> >>>
> >>> Let me rephrase it, then: where is it written in the commit-graph
> >>> format specification that these must be the same in all layers?
> >>>
> >>> Nowhere.
> >>
> >> OK. We can certainly document that this is the case.
> >
> > IMO we absolutely must document this first; ideally it should have
> > been carefully considered and documented right from the start.
>
> You're right. One of the major difficulties in bringing a Bloom
> filter implementation to the commit-graph feature when we did was
> that the split commit-graph was introduced between our initial
> prototypes and when it was finally prepared for a full submission.
>
> There certainly are gaps in the implementation and documentation.
> I think Taylor is doing a great job by addressing one of those gaps
> in a focused, thoughtful way.
>
> > Some thougths about this:
> >
> >   https://public-inbox.org/git/20200619140230.GB22200@xxxxxxxxxx/
>
> I appreciate your attention to detail. Your comments on the existing
> implementation do point out some of its shortcomings, and that is a
> valuable contribution.
>
> Actually converting those thoughts into patches is a lot of work.
>
> >> For this purpose,
> >> all we really care about is that the graph _has_ Bloom filters anywhere.
> >> If you wanted to return the exact matching settings, you could also
> >> provide a commit and return the settings belonging to the graph that
> >> contains that commit.
> >>
> >> In the case where we don't have a commit, we could use the default
> >> settings instead.
> >>
> >> I think that we are a little bit dealing with a problem that doesn't
> >> exist, since we do not document the sole method by which you would
> >> change these settings. So, maybe we can think more about this, but my
> >> preference would be to leave this patch alone.
> >
> > Other implementations can write split commit-graphs with modified path
> > Bloom filters as well, and at the moment there is nothing in the specs
> > that tells them not to use different Bloom filter settings in
> > different layers.
>
> You are 100% correct that there is a gap in documentation. That should
> be corrected at some point. (I don't consider it a blocker for this
> series.)
>
> But also: Git itself is the true test of a "correct" third-party
> implementation. libgit2 and JGit try to match Git, "warts and all".
> If another implementation wrote data that results in incorrect
> behavior by Git, then that implementation is wrong.
>
> Improving documentation can make those errors less likely.

I agree with this reasoning. Would anybody object to moving forward with
this series without a change in documentation today (but rather down the
road)?

> We also must design with "future Git" in mind, presenting it with
> enough flexibility to improve formats. The custom Bloom filter
> settings do allow that flexibility, but the requirement that all
> layers have identical settings exists for a reason (despite not
> being documented). It is important that any commit walk that intends
> to use the changed-path Bloom filters can compute the bloom keys
> for the test paths only once.
>
> Thanks,
> -Stolee

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux