On 5/12/2018 4:04 AM, Eckhard Maaß wrote:
On Fri, May 11, 2018 at 12:56:39PM +0000, Ben Peart wrote:
After performing a merge that has conflicts git status will, by default,
attempt to detect renames which causes many objects to be examined. In a
virtualized repo, those objects do not exist locally so the rename logic
triggers them to be fetched from the server. This results in the status call
taking hours to complete on very large repos vs seconds with this patch.
I see where your need comes from, but as you based this on my little
patch one can achieve this already with tweaking diff.renames itself. I
do wonder why there is a special need for the status command here
The rename detection feature is nice and we'd like to leave it on
whenever possible. The performance issues only occur when in the middle
of a merge - normal status commands behave reasonably. As a result, we
don't want to just turn it off completely by setting diff.renames.
Until we come up with a more elegant solution, we currently turn it off
completely for merge via the new merge settings and then intercept calls
to status and if there is a MERGE_HEAD we turn it off for status just
for that specific call. I view this as a temporary solution so would
not want to put that logic into git proper as it is quite specific to
when running git on a virtualized repo.
if there is, I personally would like it more in a style that you could
take all the options provided by diff.*-configuration and prefix that
with status, eg status.diff.renames = true. What do you think? If you
really only need this for merges, maybe a more specialised option is
called for that only kicks in when there is a merge going on?
I would like that status behaves as similar as possible to
diff/show/log. Special options will pull away from that again - passing
-m to show or log will lead to the same performance issues, correct?
Could it be feasible to impose an overall time limit on the detection?
I agree that they should behave as similar as possible which is why all
the new settings default to the diff setting when not explicitly set. I
believe this is a good model - if you don't do anything special you get
the default/same behavior but if you know and need special behavior, you
now have that option.
And after writing this I wonder what were your experience with just
tweaking renameLimit - setting it very low should have helped the
fetching from server part already, shouldn't it?
Add --no-renames command line option to status that enables overriding the
config setting from the command line. Add --find-renames[=<n>] command line
option to status that enables detecting renames and optionally setting the
similarity index.
Would it be reasonable to extend this so that we just use the same
machinery for parsing command line options for the diffcore options and
pass this along? It seems to me that git status wants the same init as
diff/show/log has anyway. But I like the direction towards passing more
command line options to the git status command.
I agree that it is unfortunate that diff/merge/status all parse and deal
with config settings differently. I'd be happy to see someone tackle
that and move the code to a single, coherent model but that is beyond
the scope of this patch.
static void wt_longstatus_print_unmerged_header(struct wt_status *s)
@@ -592,6 +595,9 @@ static void wt_status_collect_changes_worktree(struct wt_status *s)
}
rev.diffopt.format_callback = wt_status_collect_changed_cb;
rev.diffopt.format_callback_data = s;
+ rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
+ rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
+ rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
copy_pathspec(&rev.prune_data, &s->pathspec);
run_diff_files(&rev, 0);
}
@@ -625,6 +631,9 @@ static void wt_status_collect_changes_index(struct wt_status *s)
rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
rev.diffopt.format_callback = wt_status_collect_updated_cb;
rev.diffopt.format_callback_data = s;
+ rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
+ rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
+ rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
copy_pathspec(&rev.prune_data, &s->pathspec);
run_diff_index(&rev, 1);
}
@@ -982,6 +991,9 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
setup_revisions(0, NULL, &rev, &opt);
rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
+ rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
+ rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
+ rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
rev.diffopt.file = s->fp;
rev.diffopt.close_file = 0;
/*
Somehow I am inclined that those should be factored out to a common
method if the rest of the patch stays as it is.
I debated that as well but given the logic is so simple, opted to stick
with this. I also debated whether it would be clearer in the form:
if (s->detect_rename >= 0)
rev.diffopt.detect_rename = s->detect_rename;
But decided git contributes are used to seeing dense code :) and this
style better matched what I saw in the merge settings.
Greetings,
Eckhard