Re: [PATCH] rebase: use reflog to find common base with upstream

Martin von Zweigbergk <martinvonz@xxxxxxxxx> · Mon, 21 Oct 2013 23:24:14 -0700

On Mon, Oct 21, 2013 at 4:24 AM, John Keeping <john@xxxxxxxxxxxxx> wrote:
> On Sun, Oct 20, 2013 at 10:03:29PM -0700, Martin von Zweigbergk wrote:
>> On Wed, Oct 16, 2013 at 11:53 AM, John Keeping <john@xxxxxxxxxxxxx> wrote:
>> > Commit 15a147e (rebase: use @{upstream} if no upstream specified,
>> > 2011-02-09) says:
>> >
>> >         Make it default to 'git rebase @{upstream}'. That is also what
>> >         'git pull [--rebase]' defaults to, so it only makes sense that
>> >         'git rebase' defaults to the same thing.
>> >
>> > but that isn't actually the case.  Since commit d44e712 (pull: support
>> > rebased upstream + fetch + pull --rebase, 2009-07-19), pull has actually
>> > chosen the most recent reflog entry which is an ancestor of the current
>> > branch if it can find one.
>>
>> It is exactly this inconsistency between "git rebase" and "git pull
>> --rebase" that confused me enough to make me send my first email to
>> this list almost 4 years ago [1], so thanks for working on this! I
>> finished that thread with:
>>
>>   Would it make sense to teach "git rebase" the same tricks as "git
>> pull --rebase"?
>>
>> Then it took me a year before I sent a patch not unlike this one [2].
>> To summarize, the patch did not get accepted then because it makes
>> rebase a little slower (or a lot slower in some cases). "git pull
>> --rebase" is of course at least as slow in the same cases, but because
>> it often involves connecting to a remote host, people would probably
>> blame the connection rather than git itself even in those rare (?)
>> cases.
>>
>> I think
>>
>>   git merge-base HEAD $(git rev-list -g "$upstream_name")
>>
>> is roughly correct and hopefully fast enough. That can lead to too
>> long a command line, so I was planning on teaching merge-base a
>> --stdin option, but never got around to it.
>
> I'm not sure we should worry about the additional overhead here.  In the
> common case, we should hit a common ancestor within the first couple of
> reflog entries; and in the case that will be slow, it's likely that
> there are a lot of differences between the branches so the cherry
> comparison phase will take a while anyway.

Perhaps true. I created a simple commit based on my origin/master@{1}
in git.git, which happened to be 136 commits behind origin/master.
Before (a modified version of) your patch, it took 0.756s to rebase it
(best of 5) and afterwards it took 0.720s.

And in a worse case: The same test with one commit off my
origin/master@{13}, 2910 behind origin/master, shows an increase from
2.75s to 4.04s.

And a degenerate case: I created a test branch (called u) with
1000-entry reflog from the output of "git rev-list --first-parent
origin/master | head -1000 | tac" and created the same simple commit
as before off of the end of this reflog (u@{999}). This ended up 3769
commits behind u@{0} (aka origin/master). In this case it went from
3.43s to 3m32s. Obviously, this was a degenerate case designed to be
slow, but I think it's still worth noting that one can get such O(n^2)
behavior e.g. if one lets a branch get out of sync with an upstream
that's very frequently fetches (I've heard of people running
short-interval cron jobs that fetch from a remote).

I do like the feature, but I'm still concerned about this last case.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html