Re: [PATCH v1 2/2] merge: Add merge.aggressive config setting

Elijah Newren <newren@xxxxxxxxx> · Tue, 24 Apr 2018 10:36:23 -0700

Hi Ben,

On Tue, Apr 24, 2018 at 9:45 AM, Ben Peart <peartben@xxxxxxxxx> wrote:
> On 4/20/2018 1:22 PM, Elijah Newren wrote:
>> On Fri, Apr 20, 2018 at 6:36 AM, Ben Peart <Ben.Peart@xxxxxxxxxxxxx>
>> wrote:
>>>
>>> Add the ability to control the aggressive flag passed to read-tree via a
>>> config setting.
>>
>> This feels like a workaround to the performance problems with index
>> updates in merge-recursive.c.
>
> This change wasn't done to solve performance problems.  We turned it on
> because it reduced the number of unmerged entries (from 40K to 1) in the
> particular merge we were looking at.  The additional 3 scenarios that
> --aggressive resolves made that much difference.
>
> That said, it makes sense to me to do

Um...color me perplexed here.  aggressive exists just to do some
resolutions that higher-level strategies can and totally ought to be
able to handle easily (the rules are almost trivially
straight-forward), but deferring allows the higher level strategies
(either merge-recursive or resolve's git-merge-one-file) to handle
slightly differently (e.g. by detecting renames).  merge-recursive
should be able to resolve anything that the unpack_trees aggressive
setting handles.  If it can't, it sounds like there's a horrible bug
somewhere.

Perhaps fixing that bug is the real problem?

Is there any chance you can dig out more details about any of these
conflicts or come up with a simple testcase where running 'git merge
-X no-renames' gives a merge conflict but running with this option
would run to completion?

>> this when rename detection is turned off.  In fact, I think you'd
>> automatically want to set aggressive to true whenever rename detection
>> is turned off (whether by your merge.renames option or the
>> -Xno-renames flag).
>> > I can't think of any reason this setting would be useful separate from
>> turning rename detection off, and it'd actively harm rename detection
>> performance improvements I have in the pipeline.  I'd really prefer to
>> not add this option, and instead combine the setting of aggressive
>> with the other flag.  Do you have an independent reason for wanting
>> this?
>>
>
> While combining them would work for our specific use scenario (since we turn
> both on already along with turning off merge.stat), I really hesitate to tie
> these two different flags and code paths together with a single config
> setting.
>
> While I don't want to needlessly complicate your optimizations in this area
> (they are already complex enough!) I believe we need to keep the option to
> turn on --aggressive without turning off rename detection as a viable
> option.  Perhaps if that is the case, your optimizations have less impact or
> don't apply but the user should be able to make that choice for their
> specific situation.

I totally buy that you need at least one option to avoid waiting for
(current) rename detection in some fashion, and that you don't want
lots of spurious conflicts.  But I don't understand why you believe
that we need to keep the option to turn on the aggressive flag
independently.  What's the usecase?  It wasn't possible before in the
code, no one else has asked for it, and even you say you don't need it
as a separate option.  Is it a concern that turning on aggressive
whenever rename-detection is turned off will break something?  The
only reason I can see to keep the aggressive codepath in unpack_trees
behind a branch instead of it always running unconditionally for every
single caller throughout the codebase is because of renames.  So the
fact that you're turning renames off, to me, suggests that aggressive
flag should automatically be turned on.  I'd even call pre-existing
code (e.g. the -X no-renames option in merge-recursive) that doesn't
turn on the aggressive flag buggy (even if the only result is
suboptimal-performance).

I don't see how an option to turn on the aggressive flag independently
is possibly useful to anyone.  Further, we have strong reason to
believe it will soon be actively harmful.  So...why?  It's totally
possible I'm just missing something.  If there's a good reason for it,
providing some kind of benefit that the user could weigh in a
tradeoff, then I can get on board with providing it as an option, but
right now I just don't see it.