Thomas Guyot <tguyot@xxxxxxxxx> writes: > My next patch simplifies the test down to: > > same_contents = one->oid_valid && two->oid_valid && > oideq(&one->oid, &two->oid); > > My understanding is that oid_valid will never be true for a modified > (even mode change) or out of tree file so it's a valid assumption. > > I'll also rename that variable to "same_oid" - the current name is > misleading both ways (true doesn't means there will be diffs, false > doesn't mean contents differs). It is not "both ways", I think. The idea is that when this variable is true, we know with certainty that these two are the same, but even when the variable is false, they still can be the same. So true does mean there will not be diff. False indeed is fuzzy. And as long as one side gives a 100% correct answer cheaply, we can use it as an optimization (and 'true' being that side in this case). I have a mild suspicion that the name same_anything conveys a wrong impression, no matter what word you use for <anything>. It does not capture that we are saying the "true" side has no false positive. And that is why I alluded to "may_differ" earlier (with opposite polarity). The flow would become: may_differ = !one->oid_valid || !two->oid_valid || !oideq(...); if (binary) { if (!may_differ) { added = deleted = 0; ... } else { ... count added and deleted ... } } else if (rewrite) { ... } else if (may_differ) { ... use xdl ... } and it would become quite straight-forward to follow. When there is no chance that they may be different, we short-cut and otherwise we compute without cheating. Only when they can be different, we do the expensive xdl thing. Thanks.