Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > Yes - if the bloom filter contained junk data (in our example, created > using a different hash function on filenames that have characters that > exceed 0x7f), the bloom filter would report "no, this commit does not > contain a change in such-and-such path" and then we would skip the > commit, even if the commit did have a change in that path. Just to help my understanding (read: I am not suggesting this as one of the holes to exploit to help a smooth transition), does the above mean that, as long as the path we are asking about does not have a byte with the high-bit set, we would be OK, even if the Bloom filter were constructed with a bad function and there were other paths that had such a byte? > I don't have statistics on this, but if the majority of repos have > only <=0x7f filenames (which seems reasonable to me), this might save > sufficient work that we can proceed with bumping the version number and > ignoring old data. > >> Better yet, we should be able to reuse existing Bloom filter data for >> paths that have all characters <=0xff, and only recompute them where "ff" -> "7f" I presume? >> necessary. That makes much more sense than the previous paragraph.