On Tue, Oct 09 2018, Derrick Stolee wrote: > The filter needs to store every path that would be considered "not > TREESAME". It can't store wildcards, so you would need to evaluate the > wildcard and test all of those paths individually (not a good idea). If full paths are stored, yes, But have you considered instead of storing paths, storing all trigrams that can be extracted from changed paths at that commit? I.e. instead of a change to "t/t0000-basic.sh" storing "t/t0000-basic.sh" we'd store ["t/t", "/t0", "t00", "000", "00-" ...] etc. That sort of approach would mean that e.g. "t*000*", "*asi*.sh" etc. could all be indexed, and as long as we could find three consecutive bytes of fixed string we'd have a chance to short-circuit, but would need to degrade to a full tree unpack for e.g. "t*". We could also special-case certain sub-three-char indexes, or to "bi-grams". E.g. to be able to index '*.c' or 't*' (first char at the beginning of a string only). It would mean having to check more things in the bloom filter for each commit, but that's going to be hot in cache at that point so it'll probably beat unpacking trees by far, and we could short-circuit exit at the first one that returned false.