Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 23 Sep 2020 23:40:58 -0700

Thomas Guyot <tguyot@xxxxxxxxx> writes:

> My next patch simplifies the test down to:
>
>     same_contents = one->oid_valid && two->oid_valid &&
>         oideq(&one->oid, &two->oid);
>
> My understanding is that oid_valid will never be true for a modified
> (even mode change) or out of tree file so it's a valid assumption.
>
> I'll also rename that variable to "same_oid" - the current name is
> misleading both ways (true doesn't means there will be diffs, false
> doesn't mean contents differs).

It is not "both ways", I think.  The idea is that when this variable
is true, we know with certainty that these two are the same, but
even when the variable is false, they still can be the same.  So
true does mean there will not be diff.  False indeed is fuzzy.

And as long as one side gives a 100% correct answer cheaply, we can
use it as an optimization (and 'true' being that side in this case).

I have a mild suspicion that the name same_anything conveys a wrong
impression, no matter what word you use for <anything>.  It does not
capture that we are saying the "true" side has no false positive.

And that is why I alluded to "may_differ" earlier (with opposite
polarity).  The flow would become:

    may_differ = !one->oid_valid || !two->oid_valid || !oideq(...);

    if (binary) {
        if (!may_differ) {
            added = deleted = 0;
            ...
        } else {
            ... count added and deleted ...
        }
    } else if (rewrite) {
	...
    } else if (may_differ) {
	... use xdl ...
    }

and it would become quite straight-forward to follow.  When there is
no chance that they may be different, we short-cut and otherwise we
compute without cheating.  Only when they can be different, we do
the expensive xdl thing.

Thanks.