On Wed, Apr 15, 2020 at 02:25:48PM -0700, Junio C Hamano wrote: > Derrick Stolee <stolee@xxxxxxxxx> writes: > > > THIS IS A BREAKING CHANGE. Commit-graph files with changed-path > > Bloom filters computed by a previous commit will not be compatible > > with the filters computed in this commit, nor will we get correct > > results when testing across these incompatible versions. Normally, > > this would be a completely unacceptable change, but the filters > > have not been released and hence are still possible to update > > before release. > > Sure, it hasn't even hit 'next' yet. > > But I think we are both sort-of leaning towards concluding that it > does not help all that much. So I think it is OK. Yeah, I think that the only thing that it is potentially helping is the :(icase) magic. In a repository that doesn't have colliding paths in case-insensitive filesystems (i.e., having both 'foo' and 'FOO' in the same tree), this doesn't seem to be obviously hurting anything. But, it is at least semi-harmful to repositories that do have such trees, since we now can't answer "probably yes" or "definitely not" for colliding paths unless both produce the same fingerprint. That seems like a clear downside. Now, how common is that against people who would benefit from changed-path Bloom filters in their commit-graphs? I don't know one way or the other. But, the upside seems to be minimal compared to the potential cost, so I think that it may be better to just leave this one alone. > > TODO: If we decide to move in this direction, then the following > > steps should be done (and some of them should be done anyway): > > Even if we decide not to do this "downcase before hashing" thing, we > should document how exactly we compute, I think. > > And if we decide do change our mind later, it is not the end of the > world. We should be able to use a different chunk type to store the > filters computed over downcased paths. > > > * We need to document the Bloom filter format to specify exactly > > how we compute the filter data. The details should be careful > > enough that someone can reproduce the exact file format without > > looking at the C code. > > > > * That document would include the tolower() transformation that is > > being done here. > > As the tree-diff comparison done internally while running "git > blame" does not take an end-user specified pathspec in any > meaningful way, this does not matter in practice, but there is > another possible complication we would want to consider when we > extend the support to "git log" and friends---negative pathspecs > (e.g. "git log ':(exclude)COPYING'"). A commit that cannot possibly > have touched the COPYING file would be eligible for output without > actually running tree-diff between it and its parent (as long as the > trees of the two commits are different, we know it must be shown). > > Thanks. Thanks, Taylor