Re: [PATCH v4] diff: handle --no-abbrev in no-index case

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 08 Dec 2016 14:53:14 -0800

Jack Bates <bk874k@xxxxxxxxxxxxxxxx> writes:

> There are two different places where the --no-abbrev option is parsed,
> and two different places where SHA-1s are abbreviated. We normally parse
> --no-abbrev with setup_revisions(), but in the no-index case, "git diff"
> calls diff_opt_parse() directly, and diff_opt_parse() didn't handle
> --no-abbrev until now. (It did handle --abbrev, however.) We normally
> abbreviate SHA-1s with find_unique_abbrev(), but commit 4f03666 ("diff:
> handle sha1 abbreviations outside of repository, 2016-10-20) recently
> introduced a special case when you run "git diff" outside of a
> repository.
>
> setup_revisions() does also call diff_opt_parse(), but not for --abbrev
> or --no-abbrev, which it handles itself. setup_revisions() sets
> rev_info->abbrev, and later copies that to diff_options->abbrev. It
> handles --no-abbrev by setting abbrev to zero. (This change doesn't
> touch that.)
>
> Setting abbrev to zero was broken in the outside-of-a-repository special
> case, which until now resulted in a truly zero-length SHA-1, rather than
> taking zero to mean do not abbreviate. The only way to trigger this bug,
> however, was by running "git diff --raw" without either the --abbrev or
> --no-abbrev options, because 1) without --raw it doesn't respect abbrev
> (which is bizarre, but has been that way forever), 2) we silently clamp
> --abbrev=0 to MINIMUM_ABBREV, and 3) --no-abbrev wasn't handled until
> now.
>
> The outside-of-a-repository case is one of three no-index cases. The
> other two are when one of the files you're comparing is outside of the
> repository you're in, and the --no-index option.

Nicely described.  

> diff --git a/diff.c b/diff.c
> index ec87283..84dba60 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3106,7 +3106,8 @@ static const char *diff_abbrev_oid(const struct object_id *oid, int abbrev)
>  			abbrev = FALLBACK_DEFAULT_ABBREV;
>  		if (abbrev > GIT_SHA1_HEXSZ)
>  			die("BUG: oid abbreviation out of range: %d", abbrev);
> -		hex[abbrev] = '\0';
> +		if (abbrev)
> +			hex[abbrev] = '\0';
>  		return hex;
>  	}
>  }

This is the same since your earlier round and it is correct.  The
code before this part clamps abbrev to be between 0 and 40.

> @@ -3364,6 +3365,7 @@ void diff_setup(struct diff_options *options)
>  
>  	options->file = stdout;
>  
> +	options->abbrev = DEFAULT_ABBREV;

This is a new change relative to your earlier one.  

I looked at all the callers of diff_setup() and noticed that many of
them were initializing "struct diff_options" that is on-stack that
is totally uninitialized, which means they were using a completely
random value that happened to be on the stack.  

Which was surprising and made me wonder how the entire "diff" code
could have ever worked correctly for the past 10 years, as it's not
like all the users always passed --[no-]abbrev[=<value>] from the
command line.

In any case, this cannot possibly be introducing a regression; these
callsites of diff_setup() were starting from a random garbage---now
they start with -1 in this field.  If they were doing the right
thing by assigning their own abbrev to the field after diff_setup()
returned, they will continue to do the same, and otherwise they will
keep doing whatever random things they have been doing when the
uninitialized field happened to contain -1 the same way.

I didn't look carefully at the additional tests, but the code change
looks good.

Thanks.