Hi Alban, On Wed, 17 Mar 2021, Alban Gruin wrote: > This rewrites `git merge-octopus' from shell to C. As for the two last > conversions, this port removes calls to external processes to avoid > reading and writing the index over and over again. > > - Calls to `read-tree -u -m (--aggressive)?' are replaced by calls to > unpack_trees(). > > - The call to `write-tree' is replaced by a call to > write_index_as_tree(). > > - The call to `diff-index ...' is replaced by a call to > repo_index_has_changes(). > > - The call to `merge-index', needed to invoke `git merge-one-file', is > replaced by a call to merge_all_index(). > > The index is read in cmd_merge_octopus(), and is wrote back by s/wrote/written/ > merge_strategies_octopus(). I wonder why, though. Maybe the commit message could clarify that? > Here to, merge_strategies_octopus() takes two commit lists and a string s/to,/too,/ > to reduce frictions when try_merge_strategies() will be modified to call s/frictions/friction/ > it directly. > > Signed-off-by: Alban Gruin <alban.gruin@xxxxxxxxx> > --- > > [...] > diff --git a/builtin/merge-octopus.c b/builtin/merge-octopus.c > new file mode 100644 > index 0000000000..9b9939b6b2 > --- /dev/null > +++ b/builtin/merge-octopus.c > @@ -0,0 +1,70 @@ > +/* > + * Builtin "git merge-octopus" > + * > + * Copyright (c) 2020 Alban Gruin > + * > + * Based on git-merge-octopus.sh, written by Junio C Hamano. > + * > + * Resolve two or more trees. > + */ > + > +#include "cache.h" > +#include "builtin.h" > +#include "commit.h" > +#include "merge-strategies.h" > + > +static const char builtin_merge_octopus_usage[] = > + "git merge-octopus [<bases>...] -- <head> <remote1> <remote2> [<remotes>...]"; > + > +int cmd_merge_octopus(int argc, const char **argv, const char *prefix) > +{ > + int i, sep_seen = 0; > + struct commit_list *bases = NULL, *remotes = NULL; > + struct commit_list **next_base = &bases, **next_remote = &remotes; > + const char *head_arg = NULL; > + struct repository *r = the_repository; > + > + if (argc < 5) > + usage(builtin_merge_octopus_usage); > + > + setup_work_tree(); > + if (repo_read_index(r) < 0) > + die("invalid index"); > + > + /* > + * The first parameters up to -- are merge bases; the rest are > + * heads. > + */ > + for (i = 1; i < argc; i++) { > + if (strcmp(argv[i], "--") == 0) > + sep_seen = 1; > + else if (strcmp(argv[i], "-h") == 0) > + usage(builtin_merge_octopus_usage); > + else if (sep_seen && !head_arg) > + head_arg = argv[i]; > + else { > + struct object_id oid; > + struct commit *commit; > + > + if (get_oid(argv[i], &oid)) > + die("object %s not found.", argv[i]); > + > + commit = oideq(&oid, r->hash_algo->empty_tree) ? > + NULL : lookup_commit_or_die(&oid, argv[i]); > + > + if (sep_seen) > + next_remote = commit_list_append(commit, next_remote); > + else > + next_base = commit_list_append(commit, next_base); > + } > + } > + > + /* > + * Reject if this is not an octopus -- resolve should be used > + * instead. > + */ > + if (commit_list_count(remotes) < 2) > + return 2; As with `merge-resolve`, I would suggest to: - move this input validation down to `merge_strategies_octopus()`, and - change that function's signature to return an `enum`, and then - make sure that that `enum` uses easy-to-understand labels. > + > + return merge_strategies_octopus(r, bases, head_arg, remotes); > +} > > [...] > > diff --git a/merge-strategies.c b/merge-strategies.c > index a51700dae5..ebc0d0b1e2 100644 > --- a/merge-strategies.c > +++ b/merge-strategies.c > @@ -367,3 +368,177 @@ int merge_strategies_resolve(struct repository *r, > > return 0; > } > + > +static int write_tree(struct repository *r, struct tree **reference_tree) > +{ > + struct object_id oid; > + int ret; > + > + if (!(ret = write_index_as_tree(&oid, r->index, r->index_file, > + WRITE_TREE_SILENT, NULL))) > + *reference_tree = lookup_tree(r, &oid); > + > + return ret; > +} > + > +static int octopus_fast_forward(struct repository *r, const char *branch_name, > + struct tree *tree_head, struct tree *current_tree, > + struct tree **reference_tree) While I objected to the name of the `fast_forward()` function, I think the `octopus_fast_forward()` function is named aptly. > +{ > + /* > + * The first head being merged was a fast-forward. Advance the > + * reference commit to the head being merged, and use that tree > + * as the intermediate result of the merge. We still need to > + * count this as part of the parent set. > + */ > + struct tree_desc t[2]; > + > + printf(_("Fast-forwarding to: %s\n"), branch_name); > + > + init_tree_desc(t, tree_head->buffer, tree_head->size); > + if (add_tree(current_tree, t + 1)) > + return -1; > + if (fast_forward(r, t, 2, 0)) > + return -1; > + if (write_tree(r, reference_tree)) > + return -1; > + > + return 0; > +} > + > +static int octopus_do_merge(struct repository *r, const char *branch_name, > + struct commit_list *common, struct tree *current_tree, > + struct tree **reference_tree) > +{ > + struct tree_desc t[MAX_UNPACK_TREES]; > + struct commit_list *i; > + int nr = 0, ret = 0; > + > + printf(_("Trying simple merge with %s\n"), branch_name); > + > + for (i = common; i; i = i->next) { > + struct tree *tree = repo_get_commit_tree(r, i->item); > + if (add_tree(tree, t + (nr++))) > + return -1; > + } > + > + if (add_tree(*reference_tree, t + (nr++))) > + return -1; > + if (add_tree(current_tree, t + (nr++))) > + return -1; > + if (fast_forward(r, t, nr, 1)) > + return 2; > + > + if (write_tree(r, reference_tree)) { > + struct lock_file lock = LOCK_INIT; > + > + puts(_("Simple merge did not work, trying automatic merge.")); > + repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR); It is a bit funny to see this as the only time in this patch where the index is locked, and it is immediately released thereafter. I would have expected the lock to be taken first thing in `merge_strategies_octopus()` and then being committed only on success, or on failure to merge. > + ret = !!merge_all_index(r->index, 0, 0, merge_one_file_func, NULL); > + write_locked_index(r->index, &lock, COMMIT_LOCK); > + > + write_tree(r, reference_tree); > + } > + > + return ret; > +} > + > +int merge_strategies_octopus(struct repository *r, > + struct commit_list *bases, const char *head_arg, > + struct commit_list *remotes) > +{ > + int ff_merge = 1, ret = 0, nr_references = 1; > + struct commit **reference_commits, *head_commit; > + struct tree *reference_tree, *head_tree; > + struct commit_list *i; > + struct object_id head; > + struct strbuf sb = STRBUF_INIT; > + > + get_oid(head_arg, &head); > + head_commit = lookup_commit_reference(r, &head); > + head_tree = repo_get_commit_tree(r, head_commit); > + > + if (parse_tree(head_tree)) > + return 2; > + > + if (repo_index_has_changes(r, head_tree, &sb)) { > + error(_("Your local changes to the following files " > + "would be overwritten by merge:\n %s"), > + sb.buf); > + strbuf_release(&sb); > + return 2; > + } > + > + CALLOC_ARRAY(reference_commits, commit_list_count(remotes) + 1); > + reference_commits[0] = head_commit; > + reference_tree = head_tree; > + > + for (i = remotes; i && i->item; i = i->next) { > + struct commit *c = i->item; > + struct object_id *oid = &c->object.oid; > + struct tree *current_tree = repo_get_commit_tree(r, c); > + struct commit_list *common, *j; > + char *branch_name = merge_get_better_branch_name(oid_to_hex(oid)); > + int up_to_date = 0; > + > + common = repo_get_merge_bases_many(r, c, nr_references, reference_commits); > + if (!common) { > + error(_("Unable to find common commit with %s"), branch_name); > + > + free(branch_name); > + free_commit_list(common); > + free(reference_commits); > + > + return 2; > + } > + > + for (j = common; j && !up_to_date && ff_merge; j = j->next) { > + up_to_date |= oideq(&j->item->object.oid, oid); Semantically, I would argue that this is an `||=`, not `|=`: we want a Boolean "or", not a bit-wise one. > + > + if (!j->next && > + !oideq(&j->item->object.oid, > + &reference_commits[nr_references - 1]->object.oid)) > + ff_merge = 0; > + } Hmm. This is combining two things into the same loop, with a combined loop condition. The two things are: case "$LF$common$LF" in *"$LF$SHA1$LF"*) eval_gettextln "Already up to date with \$pretty_name" continue ;; esac if test "$common,$NON_FF_MERGE" = "$MRC,0" then # The first head being merged was a fast-forward. # Advance MRC to the head being merged, and use that # tree as the intermediate result of the merge. # We still need to count this as part of the parent set. eval_gettextln "Fast-forwarding to: \$pretty_name" git read-tree -u -m $head $SHA1 || exit MRC=$SHA1 MRT=$(git write-tree) continue fi NON_FF_MERGE=1 The first one tries to verify that the `common` list contains `oid`. The C code does this, too, using the intuitive variable name `up_to_date`, which is good. Now, big question: is there a way for the loop to exit before we had a chance to see the common commit that is identical to `oid`? And I think there is: `ff_merge` is not reset between the outer loop (the one iterating over `remotes`). If that is the case, then we would miss that we're already up to date. Next thing is that `if test "$common,$NON_FF_MERGE" = "$MRC,0"` thing. This is turned into that `if (!j->next && ...)` thing, and I _think_ that it does the wrong thing. Rather than verifying that the `common` list is identical to "MRC" (= the merge reference list), it would only ever compare the last entries of `common` and MRC. I have a hard time convincing myself that this is idempotent to the shell script version. Instead, I think it should read somewhat like this: for (j = common, k = 0; j && (!up_to_date || ff_merge); j = j->next) { up_to_date ||= oideq(&j->item->object.oid, oid); if (ff_merge && (k >= nr_references || !oideq(&j->item->object.oid, &reference_commits[k++]->object.oid)) ff_merge = 0; } But quite honestly, this still looks "too clever" and too fragile to me. For something as rare as an octopus merge, I'd _much_ rather have simpler code that is easy to reason about and does the job reliably (if somewhat slower than a hyper-optimized version): /* * If `oid` is reachable from `HEAD`, we're already up to * date. */ for (j = common; j; j = j->next) if (oideq(&j->item->object.oid, oid)) { up_to_date = 1; break; } if (up_to_date) { printf(_("Already up to date with %s\n"), branch_name); free(branch_name); free_commit_list(common); continue; } for (j = common, k = 0; ff_merge && j; j = j->next) if (k >= nr_references || !oideq(&j->item->object.oid, &reference_commits[k++]->object.oid)) ff_merge = 0; if (k != nr_references) ff_merge = 0; But the more I stare at the shell script code, the more I start to believe that this `MRC` business is just a very convoluted way to essentially verify that the `HEAD` is the _single_ merge base. I say that because I cannot fail to notice that `$common` separates the merge bases by newlines, while `$MRC` separates its entries by spaces. Therefore, test "$common,$NON_FF_MERGE" = "$MRC,0" can only ever evaluate to `true` if both `$common` and `$MRC` contains exactly one and the same oid, namely the one of the revision to which we just fast-forwarded in the previous iteration. Therefore, the logic does not even need a loop. It would be as trivial as: /* * If we could fast-forward so far and `HEAD` is the * single merge base with the current `remote` revision, * keep fast-forwarding. */ if (ff_merge && common && !common->next && nr_references == 1 && oideq(common->item->object.oid, reference_commit[0]->object.oid)) { ret = octopus_fast_forward(r, branch_name, head_tree, current_tree, &reference_tree); nr_references = 0; } else { ff_merge = 0; ret = octopus_do_merge(r, branch_name, common, current_tree, &reference_tree); } > + > + if (up_to_date) { > + printf(_("Already up to date with %s\n"), branch_name); > + > + free(branch_name); > + free_commit_list(common); > + continue; > + } > + > + if (ff_merge) { > + ret = octopus_fast_forward(r, branch_name, head_tree, > + current_tree, &reference_tree); > + nr_references = 0; > + } else { > + ret = octopus_do_merge(r, branch_name, common, > + current_tree, &reference_tree); > + } > + > + free(branch_name); > + free_commit_list(common); > + > + if (ret == -1 || ret == 2) > + break; > + else if (ret && i->next) { > + /* > + * We allow only last one to have a > + * hand-resolvable conflicts. Last round failed > + * and we still had a head to merge. > + */ > + puts(_("Automated merge did not work.")); > + puts(_("Should not be doing an octopus.")); > + > + free(reference_commits); > + return 2; I see that you moved this block from the beginning of the loop to the end (in the script, it was at the start of the loop). This is a good change. I wonder, though, whether it wouldn't make more sense to replace the last two lines with this: ret = 2; break; That way, we need not worry about releasing resources in multiple places in the future: it will all be done at the end of the function. Phew. What a lot to unpack. Please let me express my gratitude for working on this. My many comments may seem as if I am unhappy with the progress, but nothing could be further from the truth. I am impressed by your tenacity, and I hope that I could do my little bit to make this patch series as good as we can. Thanks, Dscho > + } > + > + reference_commits[nr_references++] = c; > + } > + > + free(reference_commits); > + return ret; > +} > diff --git a/merge-strategies.h b/merge-strategies.h > index bba4bf999c..8de2249ee6 100644 > --- a/merge-strategies.h > +++ b/merge-strategies.h > @@ -32,5 +32,8 @@ int merge_all_index(struct index_state *istate, int oneshot, int quiet, > int merge_strategies_resolve(struct repository *r, > struct commit_list *bases, const char *head_arg, > struct commit_list *remote); > +int merge_strategies_octopus(struct repository *r, > + struct commit_list *bases, const char *head_arg, > + struct commit_list *remote); > > #endif /* MERGE_STRATEGIES_H */ > -- > 2.31.0 > >