Re: [PATCH v5 4/4] commit-graph: new filter ver. that fixes murmur3

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 26 Jul 2023 19:32:33 -0400

On Thu, Jul 20, 2023 at 02:27:53PM -0700, Jonathan Tan wrote:
> > I think the early checks would be more expensive, since in the worst
> > case you have to walk the entire tree, only to realize that you actually
> > wanted to compute a first-parent tree diff, meaning you have to
> > essentially repeat the whole walk over again. But for repositories that
> > have few or no commits whose Bloom filters need computing, I think it
> > would be significantly faster, since many of the sub-trees wouldn't need
> > to be visited again.
>
> So for repositories that need little-to-no recomputation of Bloom
> filters, your idea likely means that each tree needs to be read once,
> as compared to recomputing everything in which, I think, each tree needs
> to be read roughly twice (once when computing the Bloom filter for the
> commit that introduces it, and once for the commit that substitutes a
> different tree in place).
>
> I could change the text of the commit message to discuss this (instead
> of the blanket statement that it would be too hard), although I think
> that an implementation of this can be done after this patchset. What do
> you think?

Right, I think that a sizeable portion of repositories will need to
compute relatively few Bloom filters overall. If you feel strongly that
it shouldn't be included in this series, I could live with that since
this is all behind a configuration variable anyway.

I think at minimum we should call it out in the documentation, at least
until such functionality is implemented, since unsuspecting users/forge
operators may bump the filter version forward and be surprised when they
suddenly have to recompute every existing Bloom filter.

Thanks,
Taylor