Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 21, 2020 at 02:51:21PM -0700, Junio C Hamano wrote:

> > This is the direction I was getting at in my earlier emails, except that
> > I imagined that first conditional could be checking:
> >
> >   if (!one->oid_valid || !two->oid_valid)
> >
> > but I was surprised to see that diff_fill_oid_info() does not set
> > oid_valid. Is that a bug?
> 
> I do not think so.  oid_valid refers to the state during the
> collection phase (those who called diff_addremove() etc.) and
> updating it in diff_fill_oid_info() would lose information.  Maybe
> nobody looks at the bit at this late in the processing chain these
> days, in which case we can start flipping the bit there, but I
> offhand do not know what consequences such a change would trigger.

We use the flag to determine whether we need to compute the oid from
scratch. So I would think the current code causes us to compute the oid
multiple times in many cases. For example, with this patch:

diff --git a/diff.c b/diff.c
index ee8e8189e9..8363abab5b 100644
--- a/diff.c
+++ b/diff.c
@@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
 				die_errno("stat '%s'", one->path);
 			if (index_path(istate, &one->oid, one->path, &st, 0))
 				die("cannot hash %s", one->path);
+			warning("computed oid of %s as %s",
+				one->path, oid_to_hex(&one->oid));
 		}
 	}
 	else

I get (because diff.c is dirty in my working tree due to the patch):

  $ ./git diff --stat -p
  warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
   diff.c | 2 ++
   1 file changed, 2 insertions(+)
  
  warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
  diff --git a/diff.c b/diff.c
  index ee8e8189e9..8363abab5b 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
   				die_errno("stat '%s'", one->path);
   			if (index_path(istate, &one->oid, one->path, &st, 0))
   				die("cannot hash %s", one->path);
  +			warning("computed oid of %s as %s",
  +				one->path, oid_to_hex(&one->oid));
   		}
   	}
   	else

even though we already know the oid in the second call, so it's wasted
work. I agree that other code could be depending on oid_valid in a weird
way, but IMHO that code is probably wrong to do so. But it may not be
worth digging into, if nobody has complained about the waste.

> > I also imagined that we'd have to determine right then whether the
> > contents are actually different or not with a memcmp(), to avoid
> > emitting a "0 changes" line, but we do handle that case within the
> > "!same_contents" conditional. See the comment starting with "Omit
> > diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore
> > uninteresting modifications, 2020-08-20).
> 
> Yes, we are essentially on the same page---same_contents bit is
> merely an optimization to decide cheaply when we do not have to do
> xdl, but the codepath that does the xdl must be prepared to deal
> with the "we thought they are different, but after all they turn out
> to be equivalent" case.  Therefore false positive to declare two
> different things as same cannot be tolerated, but false negative to
> declare two things that are the same as !same_contents is fine.

I thought it may matter on "maint", where we do not have 1cf3d5db9b.
I.e., I expected:

  echo foo >a
  echo foo >b
  git diff --no-index --stat a b

might switch from no output to having a line like:

  a => b | 0

But we don't even get to builtin_diffstat() there. We throw out the pair
in diffcore_skip_stat_unmatch(). Likewise, if you get past that with
something like a mode change:

  chmod +x b
  git diff --no-index --stat a b

then that does generate the "0" stat line. But it does so both before
and after the proposed change. The same thing happens in no-index mode:

  git init
  echo foo >file
  git add .
  git commit -am no-bit
  chmod +x file
  git commit -am exec-bit
  git show --stat

will give you:

   file | 0

I'm not sure if that's the desired behavior or not, but at any rate
fixing this builtin_diffstat() conditional won't change it either way. :)

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux