On Sat, Sep 05, 2020 at 02:38:54PM -0400, Taylor Blau wrote: > I don't know. I think my biggest objection is the size: we use the BIDX > chunk today to avoid having to write the length-zero Bloom filters; your > scheme would force us to write every filter. On the other hand, we could > continue to avoid writing length-zero filters, so long as the > commit-graph indicates that it knows this optimization. Thinking about it a little bit more, I'm pretty sure that this isn't as easy as it sounds. Say that we: - continued to encode length-zero Bloom filters as equal adjacent entries in the BIDX, but reserve the length-zero filter for commits with no changed-paths, _or_ commits whose Bloom filters have not yet been computed - write "too large" Bloom filters (i.e., commits with >= 512 changed paths in a diff to their first parent) as a non-empty Bloom filter with all bits set high. I think we're still no better off today than before, because of the overloading in the length-zero Bloom filter. Because we would treat empty filters the same as ones that haven't been computed, we would recompute empty filters, and that would count against our '--max-new-filters' budget. I don't see a non-convoluted way to split the overloaded length-zero case into something that is distinguishable without a format extension. By the way, I think that your idea is good, and that it would be workable without the existing structure of the BIDX chunk (which itself made sense at the time that it was written). So, I really want your idea to work. But, I think that ultimately the BFXL chunk is a more straightforward path forward. Thanks, Taylor