On Tue, Jun 13, 2023 at 02:58:24PM -0700, Junio C Hamano wrote: > "bloom" -> "Bloom", probably, as the name comes from the name of its > inventor (just like we spell "Boolean", not "boolean"). Indeed. > > + when char is signed and the repository has path names that have characters >= > > + 0x80; Git supports reading and writing them, but this ability will be removed > > + in a future version of Git. > > Makes sense. > > I wonder if we want to mention what the undesired misbehaviour the > "bug" causes and what we do to avoid getting affected by the bug > here. If we can say something like "When querying for a pathname > with a byte with high-bit set, the buggy filter may produce false > negative, making the filter unusable, but asking for a pathname > without such a byte produces no false negatives (even though we may > get false positives). When Git reads version 1 filter data, it > refrains from using it for processing paths with high-bit set to > avoid triggering the bug", then it would be ideal. Your description of the bug matches my understanding of the issue, that a corrupt filter would produce false negatives and thus be unusable. I skimmed through the rest of the series, and couldn't find a spot where we do the latter, i.e. still use v1 filters as long as we don't have any characters in the path with high-order bits set. I think this would be as simple as modifying the Bloom filter query function to return "maybe" before even trying to hash a path with at least one character with its high-bit set. Apologies if this functionality is implemented and I just missed it. Thanks, Taylor