Re: [PATCH v8 08/14] merge-resolve: rewrite in C

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Tue, 16 Aug 2022 14:17:51 +0200 (CEST)

Hi Phillip,

On Wed, 10 Aug 2022, Phillip Wood wrote:

> On 09/08/2022 19:54, Alban Gruin wrote:
> > This rewrites `git merge-resolve' from shell to C.  As for `git
> > merge-one-file', this port is not completely straightforward and removes
> > calls to external processes to avoid reading and writing the index over
> > and over again.
> >
> >   - The call to `update-index -q --refresh' is replaced by a call to
> >     refresh_index().
> >
> >   - The call to `read-tree' is replaced by a call to unpack_trees() (and
> >     all the setup needed).
> >
> >   - The call to `write-tree' is replaced by a call to
> >     cache_tree_update().  This call is wrapped in a new function,
> >     write_tree().  It is made to mimick write_index_as_tree() with
> >     WRITE_TREE_SILENT flag, but without locking the index; this is taken
> >     care directly in merge_strategies_resolve().
> >
> >   - The call to `diff-index ...' is replaced by a call to
> >     repo_index_has_changes().
> >
> >   - The call to `merge-index', needed to invoke `git merge-one-file', is
> >     replaced by a call to the new merge_all_index() function.
> >
> > The index is read in cmd_merge_resolve(), and is wrote back by
> > merge_strategies_resolve().  This is to accomodate future applications:
> > in `git-merge', the index has already been read when the merge strategy
> > is called, so it would be redundant to read it again when the builtin
> > will be able to use merge_strategies_resolve() directly.
> >
> > The parameters of merge_strategies_resolve() will be surprising at first
> > glance: why using a commit list for `bases' and `remote', where we could
> > use an oid array, and a pointer to an oid?  Because, in a later commit,
> > try_merge_strategy() will be able to call merge_strategies_resolve()
> > directly, and it already uses a commit list for `bases' (`common') and
> > `remote' (`remoteheads'), and a string for `head_arg'.  To reduce
> > frictions later, merge_strategies_resolve() takes the same types of
> > parameters.
>
> git-merge-resolve will happily merge three trees, unfortunately using
> lists of commits will break that.

But isn't `merge-resolve` specifically implemented as a merge strategy? I
do not see any contract in Git's documentation that commits to supporting
direct calls to the implementation detail that is `git merge-resolve`:

	$ man git-merge-resolve
	No manual entry for git-merge-resolve

> > merge_strategies_resolve() locks the index only once, at the beginning
> > of the merge, and releases it when the merge has been completed.
> >
> > Signed-off-by: Alban Gruin <alban.gruin@xxxxxxxxx>
> > ---
> > diff --git a/builtin/merge-resolve.c b/builtin/merge-resolve.c
> > new file mode 100644
> > index 0000000000..a51158ebf8
> > --- /dev/null
> > +++ b/builtin/merge-resolve.c
> > @@ -0,0 +1,63 @@
> > +/*
> > + * Builtin "git merge-resolve"
> > + *
> > + * Copyright (c) 2020 Alban Gruin
> > + *
> > + * Based on git-merge-resolve.sh, written by Linus Torvalds and Junio C
> > + * Hamano.
> > + *
> > + * Resolve two trees, using enhanced multi-base read-tree.
> > + */
> > +
> > +#include "cache.h"
> > +#include "builtin.h"
> > +#include "merge-strategies.h"
> > +
> > +static const char builtin_merge_resolve_usage[] =
> > +	"git merge-resolve <bases>... -- <head> <remote>";
> > +
> > +int cmd_merge_resolve(int argc, const char **argv, const char *prefix)
> > +{
> > +	int i, sep_seen = 0;
> > +	const char *head = NULL;
> > +	struct commit_list *bases = NULL, *remote = NULL;
> > +	struct commit_list **next_base = &bases;
> > +	struct repository *r = the_repository;
> > +
> > +	if (argc < 5)
> > +		usage(builtin_merge_resolve_usage);
>
> I think it would be better to call parse_options() and then check argc. That
> would give better error messages for unknown options and supports '-h' for
> free.

Again, we are talking about a merge strategy, a program that is not meant
to be called directly by the user. Why should we complicate the code by
using the `parse_options` machinery?

> I think we also need to call git_config(). I see that read-tree respects
> submodule.recurse so I think we need the same here. I suspect we should
> also be reading the merge config to respect merge.conflictStyle.

Valid concerns. Extra brownie points if you can provide a simple test case
that demonstrates the current behavior.

> > +
> > +	if (repo_index_has_changes(r, head_tree, &sb)) {
> > +		error(_("Your local changes to the following files "
> > +			"would be overwritten by merge:\n  %s"),
> > +		      sb.buf);
>
> This matches the script but I wonder why that did not check for unstaged
> changes.

Any deviations from the scripted behavior should be done on top of this
patch series, unless the deviations make the conversion substantially
cleaner.

Thanks,
Dscho