Re: [PATCH v2 3/6] diff: diff_aligned_abbrev: remove ellipsis after abbreviated SHA-1 value

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 20 Nov 2017 10:48:15 +0900

Ann T Ropea <bedhanger@xxxxxx> writes:

>> Notice the name of the function.  We no longer even attempt to align
>> the output, and in general the output column length of each line
>> would be shorter than the original.  I am wondering if the change
>> would be of less impact if we try to abbreviate to len+3 and then
>> chomp the result at the right hand side to len+3 (only if the result
>> is unique) when print_sha1_ellipsis is false.  Of course, once we go
>> that path, the code structure this patch introduces (not the one I
>> mentioned in the previous paragraph) would be necessary.  Essentially
>> you would be enhancing the "else" clause.
>
> Sorry, but you've lost me there.

After diff_aligned_abbrev(const struct object_id *oid, int len)
returns the full-size result when asked, it does this:

	abbrev = diff_abbrev_oid(oid, len);
	abblen = strlen(abbrev);

	if (abblen < GIT_SHA1_HEXSZ - 3) {
		static char hex[GIT_MAX_HEXSZ + 1];
		if (len < abblen && abblen <= len + 2)
			xsnprintf(hex, sizeof(hex), "%s%.*s", abbrev, len+3-abblen, "..");
		else
			xsnprintf(hex, sizeof(hex), "%s...", abbrev);
		return hex;
	}

When asked to abbreviate to 4, which is shorter than 40-3=37, and
the unique abbreviation results in "dead" (iow, no other object that
shares that prefix), then len=4, abblen=4, so we get "dead..." as
the output.  If the unique abbreviation is "deadb" (iow, some other
object shares "dead" prefix and we require one more hexdigit for
uniqueness), len=4, abblen=5, and abblen <= len + 2, so we get
"deadb.." as the output.  Either codepath yields 7 hexdigits, and
this is because the function wants to help the user of its output
to produce lines that are as aligned as possible.

Of course, the leeway we have ... that can be reduced is small and
limited.  For a len that is too small in a repository with many
objects that share the same prefix, we cannot fit the result in 7
(i.e. len+3) columns and may have to return a longer result.

But the point is that the existing callers that ask for len=7 are
expecting output that is "7+3=10 output columns most of the time,
even though it might yield a string longer than 10 output columns
when the object names cannot be uniquely shown with that output
width".  

And that was why I was wondering if it gives less negative impact to
callers if we abbreviate to (len+3) hexdigits and return the result.

Note that this was not (and is not) "I think yours is wrong
because..." but was (and is) "I am wondering if doing it differently
is better because...".  Asking for 7 and getting 10 feels strange
and that is one valid reason to "correct" the existing behaviour by
abbreviating to len and just dropping the padding with dots, and it
may be worth a bit of surprise to existing users.  I dunno, and that
is why it was "I wonder...".