Elijah Newren <newren@xxxxxxxxx> writes: > I'm sorry, but I'm not following you. As best I can tell, you seem to > be suggesting that if we were to use a higher similarity bar for > checking same-basename files, that such a difference would end up not > accelerating the diffcore-rename algorithm at all? No. If we assume we use the minimum similarity threashold in the new middle step that consider only the files that were moved across directories without changing their names, and the last "full matrix" step sees a src that did *not* pair with a dst of the same name in a different directory surviving, we know that the pair would not be similar enough (because we are using the same "minimum similarity" in the middle step and the full matrix step) without comparing them again. But if we used higher similarity in the middle step, the fact that such a src/dst pair surviving the middle step without producing a match only means that the pair was not similar enough with a raised bar used in the middle, and the full-matrix step need to consider the possibility that they may still be similar enough when using "minimum similarity" used for all the other pairs. And because I was assuming that requiring higher similarity in the middle step would be a prudent thing to do to avoid false matches that discard better matches elsewhere, my conclusion was that it would not be a useful optimization to do in the final full-matrix step to see if a pair is something that was a candidate in the middle step but did not match well enough (because the fact that the pair did not compare well enough with higher bar does not mean it would not compare well to pass the lower "minimum" bar).