Re: [PATCH 4/5] rebase -i: don't fork git checkout

Elijah Newren <newren@xxxxxxxxx> · Thu, 9 Sep 2021 08:01:46 -0700

Hi,

On Thu, Sep 9, 2021 at 6:57 AM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:
>
> Hi Philippe
>
> On 09/09/2021 13:40, Philippe Blain wrote:
> >>> While in general I think it's a good thing to avoid forking, this
> >>> change might result in behavioral differences. Any config that
> >>> affects 'git checkout' but not the internal 'reset.c::reset_head'
> >>> function might play a role in the rebase UX.
> >>>
> >>> One that immediately came to mind is 'submodule.recurse'. This
> >>> initial 'onto' checkout was pretty much the only part of 'git
> >>> rebase' that did something useful for submodules, so it's kind of
> >>> sad to see it regress.
> >>
> >> Thanks for pointing that out. As a non-submodule user my question
> >> would be is it actually useful for the initial checkout to work that
> >> way if the rest of rebase (and the checkout for the am backend)
> >> ignores submodules? reset.c::reset_head() just uses unpack trees like
> >> checkout so if rebase read 'submodule.recurse' then reset_head()
> >> would work like 'git checkout' and also 'git rebase --abort' and the
> >> "reset" command in the todo list would start checking out submodules.
>
> it would also affect fast-forwards
>
> >> I'm reluctant to do that until the merge backend also handles
> >> submodules unless there is a good reason that such partial submodule
> >> support would help submodule users.
> >
> > Yeah, it's not that useful, I have to admit; it can also be very confusing
> > since some parts of rebase are affected, and some not. For example, any
> > time
> > the rebase stops, like for 'edit', 'break', and when there are
> > conflicts, the
> > submodules are not updated. So I think a full solution is better than a
> > partial
> > solution; in the meantime I'm thinking the change you are proposing
> > would actually
> > be less confusing, even if it slightly changes behaviour...
> >
> > As an aside, I *think* reading submodule.recurse in rebase like it's
> > done in checkout
> > et al., i.e. something like this:
> >
> > diff --git a/builtin/rebase.c b/builtin/rebase.c
> > index 33e0961900..125ec907e4 100644
> > --- a/builtin/rebase.c
> > +++ b/builtin/rebase.c
> > @@ -26,6 +26,7 @@
> >   #include "rerere.h"
> >   #include "branch.h"
> >   #include "sequencer.h"
> > +#include "submodule.h"
> >   #include "rebase-interactive.h"
> >   #include "reset.h"
> >
> > @@ -1106,6 +1107,9 @@ static int rebase_config(const char *var, const
> > char *value, void *data)
> >           return git_config_string(&opts->default_backend, var, value);
> >       }
> >
> > +    if (!strcmp(var, "submodule.recurse"))
> > +        return git_default_submodule_config(var, value, data);
>
> That looks about right to me though I think it would be safer to call
> git_default_submodule_config() for submodule.* rather than just
> submodule.recurse
>
> >       return git_default_config(var, value, data);
> >   }
> >
> >
> > would actually also affect the merges
> > performed during the rebase, since that would affect the "global" state
> > in submodule.c.
> > I hacked exactly that the other day but did not test extensively...
>
> merge-ort.c:checkout() which is used by merge_switch_to_result() uses
> unpack_trees() so it will pick up the global state and hopefully should
> just work (cc'ing Elijah to check as I didn't look what happens when
> there are conflicts).

Yep, merge-ort was designed to just piggy back on checkout code.  The
checkout() function was basically just code lifted from
builtin/checkout.c.  Using that code means that merges now also
benefit from all the special working tree handling that is encoded
into git-checkout -- whether that's parallel checkout, submodule
handling, tricky D/F switches or symlink handling, etc.  In contrast
to merge-recursive, it does not need hundreds and hundreds of lines of
special worktree updating code sprayed all over the codebase.

Conflicts are not special in this regard; merge-ort creates a tree
which has files that include conflict markers, and then merge-ort
calls checkout() to switch the working copy over to that tree.

The only issue conflicts present for merge-ort, is that AFTER it has
checked out that special tree with conflict markers, it then has to go
and touch up the index afterwards to replace the entries for
conflicted files with multiple higher order stages.  (You could say
that merge-recursive is "index-first", since its design focuses on the
index -- updating it first and then figuring out everything else like
updating the working tree with special code afterwards.  In contrast,
merge-ort ignores the index entirely until the very end -- after a new
merge tree is created and after the working tree is updated.)

> merge-recursive.c:update_file_flags() does this
> when updating the work tree
>
>         if (S_ISGITLINK(contents->mode)) {
>
>                  /*
>
>                   * We may later decide to recursively descend into
>
>                   * the submodule directory and update its index
>
>                   * and/or work tree, but we do not do that now.
>
>                   */
>
>                  update_wd = 0;
>
>                  goto update_index;
>
>         }
>
>
>
> so it looks like it does not update the submodule directory. Given
> merge-ort is now the default perhaps it's time for rebase (and
> cherry-pick/revert) to start reading the submodule config settings (we
> parse the config before we know if we'll be using merge-ort so I don't
> know how easy it would be to only parse the submodule settings if we are
> using merge-ort).

I'd just parse any needed config in all cases.  The submodule settings
aren't going to hurt merge-recursive; it'll just ignore them.  (Or are
you worried about a mix-and-match of rebase calling both checkout and
merge code doing weird things, and you'd rather not have the checkout
bits update submodules if the merges won't?)