Derrick Stolee <stolee@xxxxxxxxx> writes: >> But if users may use icase pathspec very often, it may be worth >> considering to build the bloom filter after downcasing the paths, >> perhaps? Given that many projects extract their source code to a >> case insensitive filesystem, I would imagine that downcasing paths >> would map two originally different paths into the same thing only >> rarely, if ever, so there may not be much downside to do so. > > This behavior could be extended later, and carefully. My initial > thought was that the case check would happen on every commit. If > the :(icase) check only happens at the walk tip(s), then we could > compute a single Bloom key at the start. Sorry, I am not sure what you mean. Do you mean that we notice that the user wants to match 'foo' case insensitively, and tell the logic that uses changed-path records in the graph file that commits that cannot possibly have touched any or the paths 'foo', 'foO', 'fOo', ... (all 8 case permutations) are not interesting? I guess that would work, but I was wondering if it is simpler without much downside if the changed-path records in the graph file are prepared on paths after they are normalized to a single case. That would lose information (e.g. you no longer can say "commits that touch the path 'foo' is interesting, but those that touch the path 'Foo' are not"), but makes the side that queries much simpler (i.e. you do not have to prepare all 8 case permutations---you only ask about 'foo'). And because the Bloom filter is used only for performance to cull commits that can never possibly match, allowing a false positive that would be discarded by actually running tree-diff anyway, the only potential downside happens when the project has too many paths that are different only in cases by increased collisions and by reducing our chances to skip running tree-diff (and never affects correctness). But this is not the "could be extended later" kind of behaviour, I am afraid. It is baked in the data stored in the graph file. It all depends on how often people want :(icase) pathspec matches in the history, I suspect. My point was that we need to declare that :(icase) won't matter in real life (hence we won't optimize our data to support that use case), before the way in which the data stored in the graph file is computed is cast in stone. Thanks.