Re: [PATCH] diff-options: add --stat

Junio C Hamano <junkio@xxxxxxx> · Thu, 13 Apr 2006 11:26:00 -0700

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> Now you can say "git diff --stat" (to get an idea how many changes are
> uncommitted), or "git log --stat" (to get an idea how many changes were
> committed).

I like what this tries to do (as I already said), but there are
issues with the way it does it; here are some comments.

> +static int fn_diffstat(void *priv, mmbuffer_t *mb, int nbuf)
> +{
> +...
> +	for (i = 0; i < nbuf; i++)
> +		if (mb[i].ptr[0] == '+')
> +			x->added++;
> +		else if (mb[i].ptr[0] == '-')
> +			x->deleted++;
> +	return 0;
> +}

This is broken if you have a hunk that adds, deletes or
leaves a line that happens to begin with a plus or a minus.

A demonstration.

        : siamese; ./git-diff-files -p
        diff --git a/Makefile b/Makefile
        index 1130af4..389143e 100644
        --- a/Makefile
        +++ b/Makefile
        @@ -1,3 +1,4 @@
        +-this new line begins with a minus
         # The default target of this Makefile is...
         all:

        : siamese; ./git-diff-files --stat
        ---
         Makefile |    2 +-
         1 files changed, 1 insertions(+), 1 deletions(-)

For an added line, xdl_emit_diffrec(rec, size, " ", 1, ecb) is
called, which gives mb[0].ptr = " " and mb[1].ptr = <the
contents of the line>; fn_diffstat() is called with (nbuf == 2).
You are counting the addition mark '+', but you are counting the
minus at the beginning of '-this new line...'.

A fragile workaround would be to make fn_diffstat() aware of
what the caller does and look only at mb[0], with a note to the
poor soul who needs to debug future breakage if xdiff side ever
changes.

A less efficient but more futureproof alternative is to use
xdiff-interface.c::xdiff_outf() to make sure your callback is
fed one line at a time (an example of how this can be done is
found in combine-diff.c::combine_diff()).

For the original "emit everything to stdout" code Linus did,
this was not a problem, because that usage does not care about
the record boundary.  For this "counting '+' and '-' at the
beginning of each hunk data", you do.

> @@ -221,43 +367,49 @@ static void builtin_diff(const char *nam
>  	b_two = quote_two("b/", name_b);
>  	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
>  	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
>...
> +	if (mmfile_is_binary(&mf1) || mmfile_is_binary(&mf2)) {
> +		if (diffstat)
> +			diffstat_binary(diffstat, lbl[0]);
> +		else
> +			printf("Binary files %s and %s differ\n",
> +					lbl[0], lbl[1]);
> +	} else {
>...
> @@ -690,16 +843,19 @@ static void run_diff_cmd(const char *pgm
>  			 struct diff_filespec *one,
>  			 struct diff_filespec *two,
>  			 const char *xfrm_msg,
> -			 int complete_rewrite)
> +			 int complete_rewrite,
> +			 struct diffstat_t *diffstat)
>  {
>  	if (pgm) {
> +		if (diffstat)
> +			die ("Cannot use diffstat with external diff");
>  		run_external_diff(pgm, name, other, one, two, xfrm_msg,
>  				  complete_rewrite);
>  		return;
>  	}

Instead of driving diffstat code from run_diff(),
run_diff_cmd(), and builtin_diff(), I think it would be much
cleaner to define diff_flush_stat() as a sibling to
diff_flush_raw() and diff_flush_patch(), and bypass the
run_diff() chain.  There will be some stuff you can reuse from
builtin_diff() (mostly how it interfaces with xdiff library), so
split that out as a separate function, and call it from both
builtin_diff() and diff_flush_stat().  Then you do not have to
say "Cannot use diffstat with external diff"; even when you
usually use your wdiff based external diff, you would want an
option to get diffstat, no?  This is even more true if we are to
have three bools to toggle which format to enable like I said in
my previous message with --with-raw --with-stat --with-patch
options.

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html