Re: [PATCH v5] diff.c: keep arrow(=>) on show_stats()'s shortened filename part to make rename visible.

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 15 Oct 2013 15:54:35 -0700

Yoshioka Tsuneo <yoshiokatsuneo@xxxxxxxxx> writes:

> "git diff -M --stat" can detect rename and show renamed file name like
> "foofoofoo => barbarbar". But if destination filename is long, the line
> is shortened like "...barbarbar" so there is no way to know whether the
> file is renamed or existed in the source commit.

Is "destination" filename more special than the source filename?
Perhaps "s/if destination filename is/if filenames are/"?

	Note: I do not want you to reroll using the suggested
	wording without explanation; it may be possible that I am
	missing something obvious and do not understand why you
	singled out destination, in which case I'd rather see it
	explained better in the log message than the potentially
	suboptimal suggestion I made in the review without
	understanding the issue. Of course, it is possible that you
	want to do the same when source is overlong, in which case
	you can just say "Yeah, you're right; will reroll".

        The above applies to all the other comments in this message.

Also "s/source commit/original/".  You may not be comparing two
commits after all.

> Make sure there is always an arrow, like "...foo => ...bar".
> The output can contains curly braces('{','}') for grouping.

s/contains/contain/;

> So, in general, the outpu format is "<pfx>{<mid_a> => <mid_b>}<sfx>"

s/outpu/&t/;

> To keep arrow("=>"), try to omit <pfx> as long as possible at first
> because later part or changing part will be the more important part.
> If it is not enough, shorten <mid_a>, <mid_b>, and <sfx> trying to
> have the maximum length the same because those will be equaly important.

A sound reasoning.

> Signed-off-by: Tsuneo Yoshioka <yoshiokatsuneo@xxxxxxxxx>
> Test-added-by: Thomas Rast <trast@xxxxxxxxxxx>
> ---
>  diff.c                 | 187 +++++++++++++++++++++++++++++++++++++++++++------
>  t/t4001-diff-rename.sh |  12 ++++
>  2 files changed, 177 insertions(+), 22 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index a04a34d..cf50807 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1258,11 +1258,12 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>  	}
>  }
>  
> -static char *pprint_rename(const char *a, const char *b)
> +static void pprint_rename_find_common_prefix_suffix(const char *a, const char *b
> +													, struct strbuf *pfx, struct strbuf *a_mid
> +													, struct strbuf *b_mid, struct strbuf *sfx)

What kind of line splitting is this?

I think the real issue is that the function name is overly long, but
aside from that,

 - comma comes at the end of the line, not at the beginning of the
   next line;

 - the second and subsequent lines are indented, but not more than
   the usual line width (align with the first letter inside the
   opening parenthesis of the first line);

 - a_mid and b_mid are more "alike" than pfx and a_mid.

so I would expect to see it more like:

static void abbrev_rename(const char *a, const char *b,
			  struct strbuf *pfx,
			  struct strbuf *a_mid, struct strbuf *b_mid,
			  struct strbuf *sfx)

Note that the suggested name does not say "pprint", because in your
version of this file, the code around here is no longer doing any
printing.  The caller does so after using this function to decide
how to abbreviate renames, so naming the helper function after what
it does (e.g. abbreviate renames) is more appropriate.

>  {
>  	const char *old = a;
>  	const char *new = b;
> -	struct strbuf name = STRBUF_INIT;
>  	int pfx_length, sfx_length;
>  	int pfx_adjust_for_slash;
>  	int len_a = strlen(a);
> @@ -1272,10 +1273,9 @@ static char *pprint_rename(const char *a, const char *b)
>  	int qlen_b = quote_c_style(b, NULL, NULL, 0);
>  
>  	if (qlen_a || qlen_b) {
> -		quote_c_style(a, &name, NULL, 0);
> -		strbuf_addstr(&name, " => ");
> -		quote_c_style(b, &name, NULL, 0);
> -		return strbuf_detach(&name, NULL);
> +		quote_c_style(a, a_mid, NULL, 0);
> +		quote_c_style(b, b_mid, NULL, 0);
> +		return;
>  	}
>  
>  	/* Find common prefix */
> @@ -1321,18 +1321,151 @@ static char *pprint_rename(const char *a, const char *b)
>  		a_midlen = 0;
>  	if (b_midlen < 0)
>  		b_midlen = 0;
> +	

Trailing whitespace (there are many others you added to this file; I
won't bother to point out all of them).

> +	strbuf_add(pfx, a, pfx_length);
> +	strbuf_add(a_mid, a + pfx_length, a_midlen);
> +	strbuf_add(b_mid, b + pfx_length, b_midlen);
> +	strbuf_add(sfx, a + len_a - sfx_length, sfx_length);
> +}
> +
> +/*
> + * Omit each parts to fix in name_width.
> + * Formatted string is "<pfx>{<a_mid> => <b_mid>}<sfx>".
> + * At first, omit <pfx> as long as possible.
> + * If it is not enough, omit <a_mid>, <b_mid>, <sfx> by tring to set the length of
> + * those 3 parts(including "...") to the same.
> + * Ex:
> + * "foofoofoo => barbarbar"
> + *   will be like
> + * "...foo => ...bar".
> + * "long_parent{foofoofoo => barbarbar}longfilename"
> + *   will be like
> + * "...parent{...foofoo => ...barbar}...lename"
> + */
> +static void pprint_rename_omit(struct strbuf *pfx, struct strbuf *a_mid, struct strbuf *b_mid
> +							   , struct strbuf *sfx, int name_width)

Bad line splitting.

> +{
> +
> +#define ARROW " => "
> +#define ELLIPSIS "..."

Ugly and leaks these symbols after the function is done using them
to the remainder of this file.  Write them like this instead, perhaps?

	static const char arrow[] = " => ";
        static const char dots[] = "...";

> +#define swap(a, b) myswap((a), (b), sizeof(a))
> +	
> +#define myswap(a, b, size) do {		\
> +unsigned char mytmp[size];	\
> +memcpy(mytmp, &a, size);		\
> +memcpy(&a, &b, size);		\
> +memcpy(&b, mytmp, size);		\
> +} while (0)

These are totally unneeded, I suspect (see below).

> +
> +	int use_curly_braces = (pfx->len > 0) || (sfx->len > 0);
> +	size_t name_len;
> +	size_t len;
> +	size_t part_lengths[4];

Do not name an array in plural, i.e. elements[], unless there is a
compelling reason to do so.  By using singular, e.g. element[], the
third element can be spelled as element[3], which is more logical
than having to call it elements[3].

	Side note. a notable exception is an array that is used as a
	hash-table and frequently passed around as an argument; you
	are usually not interested in iterating over it in ascending
	order, and being able to call such a collection of things
	"things" in plural, e.g. struct object objects[], is more
	important.

> +	size_t max_part_len = 0;
> +	size_t remainder_part_len = 0;
> +	int i, j;
> +
> +	name_len = pfx->len + a_mid->len + b_mid->len + sfx->len + strlen(ARROW)
> +		+ (use_curly_braces ? 2 : 0);
> +	
> +	if (name_len <= name_width) {
> +		/* Everthing fits in name_width */
> +		return;
> +	}
> +	
> +	if (use_curly_braces) {
> +		if (strlen(ELLIPSIS) + (name_len - pfx->len) <= name_width) {
> +			/*
> +			 Just omitting left of '{' is enough
> +			 Ex: ...aaa{foofoofoo => bar}file
> +			 */

	/*
         * We format our multi-line
         * comments like
         * this.
         */

> +			strbuf_splice(pfx, name_len - pfx->len, name_width - (name_len - pfx->len), ELLIPSIS, strlen(ELLIPSIS));

Overlong line.

Is the math for the second and third arguments correct?  If you are
making "abcdefghij" into "...hij", you would splice at position 0
for length up to 'g', so it felt strange to see any arithmetic as
the second argument, but I didn't look at this code very closely.

> +			return;
> +		} else {
> +			if (pfx->len > strlen(ELLIPSIS)) {
> +				/*
> +				 Just omitting left of '{' is not enough
> +				 name will be "...{SOMETHING}SOMETHING"
> +				 */
> +				strbuf_reset(pfx);
> +				strbuf_addstr(pfx, ELLIPSIS);
> +			}
> +		}
> +	}
>  
> -	strbuf_grow(&name, pfx_length + a_midlen + b_midlen + sfx_length + 7);
> -	if (pfx_length + sfx_length) {
> -		strbuf_add(&name, a, pfx_length);
> +	/* available length for a_mid, b_mid and sfx */
> +	len = name_width - strlen(ARROW) - (use_curly_braces ? 2 : 0);
> +	
> +	/* a_mid, b_mid, sfx will be have the same max, including ellipsis("..."). */
> +	part_lengths[0] = (int)a_mid->len;
> +	part_lengths[1] = (int)b_mid->len;
> +	part_lengths[2] = (int)sfx->len;

What are these casts about?  strbuf.len is of size_t which is
already the correct type for part_length[].

> +	
> +	/* bubble sort of part_lengths, descending order */

Do not bubble sort.  Unless there is a compelling reason not to
(liek you are in a performance critical section and want to use a
custom sort algorithm), just let the platform-supplied qsort(3) do
the job by writing a small comparison function.

> +	for (i=0; i<3; i++) {
> +		for (j=i+1; j<3; j++) {
> +			if (part_lengths[j] > part_lengths[i]) {
> +				swap(part_lengths[i], part_lengths[j]);
> +			}
> +		}
> +	}
> +	
> +	if (part_lengths[1] + part_lengths[1] + part_lengths[2] <= len) {
> +		/*
> +		 * "{...foofoo => barbar}file"
> +		 * There is only one omitting part.

s/omitting/omitted/;

> +		 */
> +		max_part_len = len - part_lengths[1] - part_lengths[2];
> +	} else if (part_lengths[2] + part_lengths[2] + part_lengths[2] <= len) {
> +		/*
> +		 * "{...foofoo => ...barbar}file"
> +		 * There are 2 omitting part.

s/omitting part/omitted parts/;

> +		 */
> +		max_part_len = (len - part_lengths[2]) / 2;
> +		remainder_part_len = (len - part_lengths[2]) - max_part_len * 2;
> +	} else {
> +		/*
> +		 * "{...ofoo => ...rbar}...file"
> +		 * There are 3 omitting part.

Likewise.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html