Re: [RFC PATCH 3/3] git-rebase.sh: make git-rebase--interactive the default

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Sun, 17 Jun 2018 23:44:16 +0200 (DST)

Hi Elijah,

On Sat, 9 Jun 2018, Elijah Newren wrote:

> On Sat, Jun 9, 2018 at 3:11 PM, Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
> > On Thu, 7 Jun 2018, Elijah Newren wrote:
> >
> >> am-based rebases suffer from an reduced ability to detect directory
> >> renames upstream, which is fundamental to the fact that it throws
> >> away information about the history: in particular, it dispenses with
> >> the original commits involved by turning them into patches, and
> >> without the knowledge of the original commits we cannot determine a
> >> proper merge base.
> >>
> >> The am-based rebase will proceed by first trying git-apply, and, only
> >> if it fails, will try to reconstruct a provisional merge base using
> >> build_fake_ancestor().  This fake ancestor will ONLY contain the
> >> files modified in the patch; without the full list of files in the
> >> real merge base, renames of any missing files cannot be detected.
> >> Directory rename detection works by looking at individual file
> >> renames and deducing when a full directory has been renamed.
> >>
> >> Trying to modify build_fake_ancestor() to instead create a merge base
> >> that includes common file information by looking for a commit that
> >> contained all the same blobs as the pre-image of the patch would
> >> require a very expensive search through history, may find the wrong
> >> commit, and outside of rebase may not find any commit that matches.
> >>
> >> But we had all the relevant information to start.  So, instead of
> >> attempting to fix am which just doesn't have the relevant
> >> information, instead note its strength and weaknesses, and change the
> >> default rebase machinery to interactive since it does not suffer from
> >> the same problems.
> >
> > I'll let Eric comment on the grammar, and I'll comment on the idea
> > behind this commit instead.
> 
> Going to dump the hard job on Eric, eh?  ;-)

Just trying to learn how to delegate.

:0)

> > IMHO `git rebase -i` is still too slow to be a true replacement for
> > `git rebase --am` for the cases where it serves the user well. Maybe
> > we should work on making `rebase -i` faster, first?
> 
> That sounds fair.
> 
> > I imagine, for example, that it might make *tons* of sense to avoid
> > writing out the index and worktree files all the time. That was
> > necessary in the shell script version because if the ridiculous
> > limitations we subjected ourselves to, such as: no object-oriented
> > state worth mentioning, only string-based processing, etc. But we
> > could now start to do everything in memory (*maybe* write out the new
> > blob/tree/commit objects immediately, but maybe not) until the time
> > when we either have succeeded in the rebase, or when there was a
> > problem and we have to exit with an error. And only then write the
> > files and the index.
> 
> Hmm...are you still planning on using cherry-pick (internally rather
> than forking, of course)?

The sequencer side-steps the cherry-pick command by calling
merge_recursive() directly. But yes, this is what the interactive rebase
will always use.

>  Because cherry-pick uses the merge-recursive machinery, and the
>  merge-recursive machinery doesn't have a nice way of avoiding writing
>  to the working tree or index.

Exactly. I think it could be taught to perform its magic in memory. I am
not saying that it is necessarily easy to teach merge-recursive.c this
trick, but it should be possible.

> Fixing that is on my radar; see the first block of
> https://public-inbox.org/git/CABPp-BG2fZHm3s-yrzxyGj3Eh+O7_LHLz5pgstHhG2drigSyRA@xxxxxxxxxxxxxx/

Great!

> (reading up until "At this point, I'd rather just fix the design flaw
> rather than complicate the code further.")
> 
> However, also covered in my plans is a few things to speed up the
> merge-recursive machinery, which should provide some other performance
> benefits for interactive rebases.

I am looking forward to see this materialize.

> > In any case, I think that the rather noticeable change of the default
> > would probably necessitate a deprecation phase.
> 
> Why is it a "rather noticable change of the default"?

I was really referring to speed. But I have to admit that I do not have
any current numbers.

Another issue just hit me, though: rebase --am does not need to look at as
many Git objects as rebase --merge or rebase -i. Therefore, GVFS users
will still want to use --am wherever possible, to avoid "hydrating"
many objects during their rebase.

> If we popped up the editor and asked the user to edit the list of
> options, I'd agree, or if folks thought that it was significantly slower
> by a big enough margin (though you already suggested waiting and making
> sure we don't do that).  What else remains that qualifies?
> 
> (Okay, the default behavior to just skip empty patches rather than halt
> the rebase and ask the user to advise is different, but we could fix
> that up too.  Is there anything else?)

That is indeed a change in behavior that is rather easy to address.

As to speed: that might be harder. But then, the performance might already
be good enough. I do not have numbers (nor the time to generate them) to
back up my hunch that --am is substantially faster than --merge.

Ciao,
Dscho