On Sat, Sep 05, 2020 at 02:55:34PM -0400, Taylor Blau wrote: > On Sat, Sep 05, 2020 at 02:38:54PM -0400, Taylor Blau wrote: > > I don't know. I think my biggest objection is the size: we use the BIDX > > chunk today to avoid having to write the length-zero Bloom filters; your > > scheme would force us to write every filter. On the other hand, we could > > continue to avoid writing length-zero filters, so long as the > > commit-graph indicates that it knows this optimization. > > Thinking about it a little bit more, I'm pretty sure that this isn't as > easy as it sounds. Say that we: > > - continued to encode length-zero Bloom filters as equal adjacent > entries in the BIDX, but reserve the length-zero filter for commits > with no changed-paths, _or_ commits whose Bloom filters have not yet > been computed No, use zero-length filters for commits whose Bloom filters have not yet been computed, and use a one-byte all zero bits Bloom filter for commits with no modified paths. And this is exactly what I proposed earlier. > - write "too large" Bloom filters (i.e., commits with >= 512 changed > paths in a diff to their first parent) as a non-empty Bloom filter > with all bits set high. > > I think we're still no better off today than before, because of the > overloading in the length-zero Bloom filter. Because we would treat > empty filters the same as ones that haven't been computed, we would > recompute empty filters, and that would count against our > '--max-new-filters' budget. > > I don't see a non-convoluted way to split the overloaded length-zero > case into something that is distinguishable without a format extension. See above, no format extension needed. > By the way, I think that your idea is good, and that it would be > workable without the existing structure of the BIDX chunk (which itself > made sense at the time that it was written). > > So, I really want your idea to work. But, I think that ultimately the > BFXL chunk is a more straightforward path forward. > > > Thanks, > Taylor