Re: [PATCH 0/2] Reinstate the helpful message when `git pull --rebase` fails

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 12 Oct 2015 13:28:53 -0700

Johannes Schindelin <johannes.schindelin@xxxxxx> writes:

>> I think the most sensible regression fix as the first step at this
>> point is to call it as a separate process, just like the code calls
>> "apply" as a separate process for each patch.  Optimization can come
>> later when it is shown that it matters---we need to regain
>> correctness first.
>
> I fear that you might underestimate the finality of this "first
> step". If you reintroduce that separate process, not only is it a
> performance regression, but could we really realistically expect any
> further steps to happen after that? I do not think so.
> ...
> For the above reasons, I respectfully remain convinced that
> reintroducing the separate process would be a mistake.

I am not saying we should forever do run_command() going forward.

But I do not want to keep the direct call to merge_recursive() in
'maint'.  The topic was supposed to be "rewrite in C".  I do not
recall (and do not feel the need to read "git log" output to find
out) exactly how the series progressed, but a logical progression
would have been to run merge-recursive via run_command() like I
showed in the quick-fix in an early part of the series, followed by
a patch to turn it into a direct call to merge_recursive() as an
optimization change.

And the latter step turned out to be a regression caused by a
premature optimization.  If something introduces a regression, it
gets reverted.  As the scripted version certainly did not make an
internal call, we should just run_command().  And that is what we
want to have in the stable version people use every day.

The only thing I am saying is that the change to make a direct call
should come on top of the run_command() version with its own
justification as an optimization patch.

Going that route may require you to redo your patch, measure
performance improvements, ensure there is no unintended fallout in
other callers and longer term maintainability of the codebaths
involved, write a good log message, etc.  And such a fix to
merge_recursive() needs to be cooked sufficiently in 'pu' and
'next'.  And I view all that as a good thing.  I really hate to see
that this premature optimization to come back and bite us again---we
didn't see it while reviewing because "builtin-am: implement --3way"
was done in a single step with premature optimization from the
beginning.

Now, there are many reasons why the "first step" might turn out to
be the permanent optimal solution.

I did an unscientific experiment to rebase each of the 25 topics
that are cooking in 'next' on top of 'master'.  Only 3 of them will
fall back to the three-way merge machinery.  One possible reason why
the "first step" could stay a good-enough solution is that people
would not care and/or notice, because it is not like you are paying
unnecessary cost to spawn merge-recursive for each and every change.
It only kicks in when the patch does not apply.

Then I randomly picked one (jc/merge-drop-old-syntax) of the three
topics that does fall back, made it into a patch and ran "am -3" on
top of 'master' with and without the "first step".  The numbers from
5 runs of each look like this:

    real    0m0.109s                      real    0m0.109s
    user    0m0.080s                      user    0m0.079s
    sys     0m0.034s                      sys     0m0.035s

    real    0m0.109s                      real    0m0.105s
    user    0m0.095s                      user    0m0.087s
    sys     0m0.018s                      sys     0m0.022s

    real    0m0.109s                      real    0m0.110s
    user    0m0.075s                      user    0m0.086s
    sys     0m0.038s                      sys     0m0.028s

    real    0m0.107s                      real    0m0.108s
    user    0m0.083s                      user    0m0.075s
    sys     0m0.029s                      sys     0m0.038s

    real    0m0.106s                      real    0m0.108s
    user    0m0.086s                      user    0m0.090s
    sys     0m0.025s                      sys     0m0.023s

I am curious to see a similar number on platforms with slower
run_command().  From the above numbers alone, I cannot even see
which ones are with run_command(), even though I know the numbers on
the right hand side column were taken with run_command() and the
numbers on the left hand side column were taken with internal call.

Another possible reason why the "first step" could stay a
good-enough solution is that merge_recursive() in itself is a
heavy-weight operation that the cost of spawning a process is not
even felt [*1*].  After all, it's not like we are talking about
spawning the cost of "update-ref HEAD" dominating the cost of the
actual operation.

By the way, in order to make sure that I am running the correct
binary, I did "strace -f -e execve" on "am -3".  "mailsplit" and
"mailinfo", both of which are a lot more likely to be affected by
the cost of spawning because they are mostly dumb and straight
text-to-text filters, are spawned via run_command() interface.  And
then we do "apply --index" (which is always done), and after seeing
that fail, we do "apply --build-fake-ancestor" and "apply --cached"
before finally spawning merge-recursive (or making a direct call
internally before the "quick-fix") when we fall back to threeway.

Among the run_commands() invocations in the "am -3", I do not think
the one that spawns "merge-recursive" is the most profitable one to
turn into an internal call.  Libifying mailsplit and/or mailinfo and
making them directly callable may be a lot more useful thing to do
if you want to avoid run_command().

[Footnote]

*1* That is what I am seeing here, but this _is_ platform dependent
and that is why I said I am curious.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html