On Thu, Jul 20, 2023 at 02:27:53PM -0700, Jonathan Tan wrote: > > I think the early checks would be more expensive, since in the worst > > case you have to walk the entire tree, only to realize that you actually > > wanted to compute a first-parent tree diff, meaning you have to > > essentially repeat the whole walk over again. But for repositories that > > have few or no commits whose Bloom filters need computing, I think it > > would be significantly faster, since many of the sub-trees wouldn't need > > to be visited again. > > So for repositories that need little-to-no recomputation of Bloom > filters, your idea likely means that each tree needs to be read once, > as compared to recomputing everything in which, I think, each tree needs > to be read roughly twice (once when computing the Bloom filter for the > commit that introduces it, and once for the commit that substitutes a > different tree in place). > > I could change the text of the commit message to discuss this (instead > of the blanket statement that it would be too hard), although I think > that an implementation of this can be done after this patchset. What do > you think? Right, I think that a sizeable portion of repositories will need to compute relatively few Bloom filters overall. If you feel strongly that it shouldn't be included in this series, I could live with that since this is all behind a configuration variable anyway. I think at minimum we should call it out in the documentation, at least until such functionality is implemented, since unsuspecting users/forge operators may bump the filter version forward and be surprised when they suddenly have to recompute every existing Bloom filter. Thanks, Taylor