Re: [PATCH 3/3] diffcore-rename: fall back to -C when -C -C busts the rename limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 22, 2011 at 02:50:49PM -0700, Junio C Hamano wrote:

> When there are too many paths in the project, the number of rename source
> candidates "git diff -C -C" finds will exceed the rename detection limit,
> and no inexact rename detection is performed.  We however could fall back
> to "git diff -C" if the number of modified paths is sufficiently small.

Hmm. Compared to the previous iteration, this also seems to turn on
warnings for the diff and log families. It would have been a little
easier to review as a separate patch, I think. In particular, I'm not
sure I like the "just warn once at the end" strategy when we show
multiple diffs.

For example, in the trash repo created by your new t4001, I did:

  git commit -a -m foo &&
  git config diff.renamelimit 4 &&
  git log --raw -C -C

And like most users, I read through the commits in the order they're
presented. So I get to the second one and see that it has a bunch of
additions, but no copies. Then I keep reading and finally, after the
third commit, I see a warning that we didn't do copy detection.

Here's what I see wrong with that from the user's perspective:

  1. You give me the warning at some unspecified time after I've already
     read and digested the results of the commit it affects. So at best
     I've wasted my time looking at results that you later tell me might
     be bogus.

  2. The warning is at the very end of the potentially long output. With
     three commits, I'm likely to actually scroll all the way to the end.
     But how often do you run "git log" in git.git and then kill the
     pager after finding the answer you want, never seeing the bottom of
     git's output (and potentially before git even generates it!). So I
     may miss the warning entirely.

  3. Even if I do see the warning, I have no idea which commits it
     applies to. I have to bump the rename limit and re-run just to find
     out whether the commits I cared about were affected.

So I think it would be much more sensible to show something like:

  commit f3ba51f2e2025d15c84873cb24dcf51b77f6b8e1
  Author: A U Thor <author@xxxxxxxxxxx>
  Date:   Thu Apr 7 15:13:13 2005 -0700

      hundred

  warning: only found copies from modified paths due to too many files.
  warning: you may want to set your diff.renamelimit variable to at
           least 102 and retry the command.
  :000000 100644 0000000... 4daddb7... A  path00
  :000000 100644 0000000... 8a0f05e... A  path01
  [etc]

One argument against that is that we may end up printing many such
warnings. For the pager case, where stderr and stdout are combined, that
is probably fine. Any commit which blows the rename limit is going to
have a ton of diff output anyway, so a few extra lines doesn't hurt
(with some careful flushing, those lines are placed in the right spot).

When stderr is separated, though, it's going to be annoyingly verbose,
and multiple warnings wouldn't match up with their respective commits,
anyway.

So maybe it is worth detecting the pager case and behaving differently
based on whether stderr and stdout are combined.

Another alternative would be to embed the warning in the stdout stream,
but I don't see a clean way of doing that. The best I could come up with
is something like:

    commit f3ba51f2e2025d15c84873cb24dcf51b77f6b8e1
    Author: A U Thor <author@xxxxxxxxxxx>
    Date:   Thu Apr 7 15:13:13 2005 -0700
    X-Diff-Warning: only found copies...

which just seems a little too hack-ish.

Also, on a somewhat related note: which commands should have rename
progress reporting turned on? It makes sense for "git diff" to do it to
me. And probably even "git show". Probably not for "git log", though.
I think we'd also have to look at how that interacts with the
stderr-to-stdout pager thing. We obviously don't want progress going to
the pager.

> +static const char rename_limit_warning[] =
> +"inexact rename detection was skipped due to too many files.";
> +
> +static const char degrade_cc_to_c_warning[] =
> +"only found copies from modified paths due to too many files.";

Somehow the phrase "due to too many files" seems awkward. The
merge-recursive warning this replaces used "because there were too many
files". Which is longer, of course, but seems more natural.

> +static const char rename_limit_advice[] =
> +"you may want to set your %s variable to at least "
> +"%d and retry the command.";

And this one ends up overflowing reasonably-sized terminals, because it
gets prefixed with "warning: ", and because the variable name is
something like "diff.renamelimit".

> @@ -445,6 +452,20 @@ static int too_many_rename_candidates(int num_create,
>  
>  	options->needed_rename_limit =
>  		num_src > num_create ? num_src : num_create;

This is obviously not introduced by your patch, but I noticed once again
during testing that this number is conservatively high, isn't it? I
think the number we want is actually:

  ceil(sqrt(num_create * num_src))

right? I don't know if it is worth being more accurate.

> @@ -1656,8 +1651,9 @@ int merge_recursive(struct merge_options *o,
>  		commit_list_insert(h2, &(*result)->parents->next);
>  	}
>  	flush_output(o);
> -	if (o->needed_rename_limit)
> -		warning(rename_limit_advice, o->needed_rename_limit);
> +	if (show(o, 2))
> +		diff_warn_rename_limit("merge.renamelimit",
> +				       o->needed_rename_limit, 0);

With the call to show(), we are showing the warning only for the
outermost diff. Which is probably way better than generating a bunch of
warnings, one for each rename-detect we do. But aren't we now failing to
mention limits we hit in recursive invocations, even though they can
affect the results? In other words, isn't this a candidate for the
"find the highest limit needed and report once" strategy you used above?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]