On Wed, Aug 12, 2020 at 07:48:55AM -0400, Derrick Stolee wrote: > On 8/11/2020 7:55 PM, SZEDER Gábor wrote: > > On Tue, Aug 11, 2020 at 05:34:11PM -0400, Taylor Blau wrote: > >> On Tue, Aug 11, 2020 at 11:27:16PM +0200, SZEDER Gábor wrote: > >>> On Tue, Aug 11, 2020 at 05:21:18PM -0400, Taylor Blau wrote: > >>>> On Tue, Aug 11, 2020 at 11:18:30PM +0200, SZEDER Gábor wrote: > >>>>> On Tue, Aug 11, 2020 at 04:51:19PM -0400, Taylor Blau wrote: > >>>>>> Many places in the code often need a pointer to the commit-graph's > >>>>>> 'struct bloom_filter_settings', in which case they often take the value > >>>>>> from the top-most commit-graph. > >>>>>> > >>>>>> In the non-split case, this works as expected. In the split case, > >>>>>> however, things get a little tricky. Not all layers in a chain of > >>>>>> incremental commit-graphs are required to themselves have Bloom data, > >>>>>> and so whether or not some part of the code uses Bloom filters depends > >>>>>> entirely on whether or not the top-most level of the commit-graph chain > >>>>>> has Bloom filters. > >>>>>> > >>>>>> This has been the behavior since Bloom filters were introduced, and has > >>>>>> been codified into the tests since a759bfa9ee (t4216: add end to end > >>>>>> tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 > >>>>>> requires that Bloom filters are not used in exactly the case described > >>>>>> earlier. > >>>>>> > >>>>>> There is no reason that this needs to be the case, since it is perfectly > >>>>>> valid for commits in an earlier layer to have Bloom filters when commits > >>>>>> in a newer layer do not. > >>>>>> > >>>>>> Since Bloom settings are guaranteed to be the same for any layer in a > >>>>>> chain that has Bloom data, > >>>>> > >>>>> Is it? Where is that guaranteed? > > Perhaps instead of "guaranteed" we could say "Git never writes > a commit-graph chain with different settings between layers." > > >>>> There is no mechanism whatsoever to customize these settings that is > >>>> exposed to the user (except for the undocumented 'GIT_TEST' environment > >>>> variables). > >>> > >>> Let me rephrase it, then: where is it written in the commit-graph > >>> format specification that these must be the same in all layers? > >>> > >>> Nowhere. > >> > >> OK. We can certainly document that this is the case. > > > > IMO we absolutely must document this first; ideally it should have > > been carefully considered and documented right from the start. > > You're right. One of the major difficulties in bringing a Bloom > filter implementation to the commit-graph feature when we did was > that the split commit-graph was introduced between our initial > prototypes and when it was finally prepared for a full submission. > > There certainly are gaps in the implementation and documentation. > I think Taylor is doing a great job by addressing one of those gaps > in a focused, thoughtful way. > > > Some thougths about this: > > > > https://public-inbox.org/git/20200619140230.GB22200@xxxxxxxxxx/ > > I appreciate your attention to detail. Your comments on the existing > implementation do point out some of its shortcomings, and that is a > valuable contribution. > > Actually converting those thoughts into patches is a lot of work. > > >> For this purpose, > >> all we really care about is that the graph _has_ Bloom filters anywhere. > >> If you wanted to return the exact matching settings, you could also > >> provide a commit and return the settings belonging to the graph that > >> contains that commit. > >> > >> In the case where we don't have a commit, we could use the default > >> settings instead. > >> > >> I think that we are a little bit dealing with a problem that doesn't > >> exist, since we do not document the sole method by which you would > >> change these settings. So, maybe we can think more about this, but my > >> preference would be to leave this patch alone. > > > > Other implementations can write split commit-graphs with modified path > > Bloom filters as well, and at the moment there is nothing in the specs > > that tells them not to use different Bloom filter settings in > > different layers. > > You are 100% correct that there is a gap in documentation. That should > be corrected at some point. (I don't consider it a blocker for this > series.) > > But also: Git itself is the true test of a "correct" third-party > implementation. libgit2 and JGit try to match Git, "warts and all". > If another implementation wrote data that results in incorrect > behavior by Git, then that implementation is wrong. > > Improving documentation can make those errors less likely. I agree with this reasoning. Would anybody object to moving forward with this series without a change in documentation today (but rather down the road)? > We also must design with "future Git" in mind, presenting it with > enough flexibility to improve formats. The custom Bloom filter > settings do allow that flexibility, but the requirement that all > layers have identical settings exists for a reason (despite not > being documented). It is important that any commit walk that intends > to use the changed-path Bloom filters can compute the bloom keys > for the test paths only once. > > Thanks, > -Stolee Thanks, Taylor