Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat

Thomas Guyot <tguyot@xxxxxxxxx> · Thu, 24 Sep 2020 03:13:53 -0400

Hi Junio,

On 2020-09-24 02:40, Junio C Hamano wrote:
> Thomas Guyot <tguyot@xxxxxxxxx> writes:
> 
> It is not "both ways", I think.  The idea is that when this variable
> is true, we know with certainty that these two are the same, but
> even when the variable is false, they still can be the same.  So
> true does mean there will not be diff.  False indeed is fuzzy.

I meant to say the old behavior "lied" in both directions.

> And as long as one side gives a 100% correct answer cheaply, we can
> use it as an optimization (and 'true' being that side in this case).
> 
> I have a mild suspicion that the name same_anything conveys a wrong
> impression, no matter what word you use for <anything>.  It does not
> capture that we are saying the "true" side has no false positive.
> 
> And that is why I alluded to "may_differ" earlier (with opposite
> polarity).  The flow would become:
> 
>     may_differ = !one->oid_valid || !two->oid_valid || !oideq(...);
> 
>     if (binary) {
>         if (!may_differ) {
>             added = deleted = 0;
>             ...
>         } else {
>             ... count added and deleted ...
>         }
>     } else if (rewrite) {
> 	...
>     } else if (may_differ) {
> 	... use xdl ...
>     }
> 
> and it would become quite straight-forward to follow.  When there is
> no chance that they may be different, we short-cut and otherwise we
> compute without cheating.  Only when they can be different, we do
> the expensive xdl thing.

I toyed a bit on the binary side... I never sent my 2nd reply as I still
needed to dig up; testing with diff_filespec_is_binary() { return 1; } I
would get (as expected) the same false-positive modified binary files I
used to get in range-diff test.

What I didn't get is applying the same logic
(free_diffstat_file(file);diffstat->nr--;) didn't have any effect. I'll
have to find what differs here to make binary files how up regardless.

Regards,

Thomas