Re: Design issue in git merge driver interface

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 22 Jun 2023 12:12:50 -0700

Joshua Hudson <jhudson@xxxxxxxxxxx> writes:

> Looking at the merge driver found that some things cannot be handled,
> such as OOM condition. The fault has to propagate upwards, unwinding
> as it goes.

Even though the end-user facing documentation says:

    The merge driver is expected to leave the result of the merge in
    the file named with `%A` by overwriting it, and exit with zero
    status if it managed to merge them cleanly, or non-zero if there
    were conflicts.

the ll-merge.c:ll_ext_merge() function that calls an external merge
driver does this:

        static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
                ...
                status = run_command(&child);
                ...
                ret = (status > 0) ? LL_MERGE_CONFLICT : status;
                return ret;
        }

so a true "failure" from run_command() to run the external merge
driver will be noticed as a failure by the upper layer of the
callchain.  merge-ort.c:merge_3way() relays the return value of
ll-merge.c:ll_merge() and merge-ort.c:handle_content_merge() reacts
to a negative return as an _("Failed to execute internal merge")
error, for example.  merge-recursive uses the same logic.

Unfortunately, I see no provision for the merge driver to actively
signal such a condition.  The return value of run_command() is a
return value from run-command.c:wait_or_whine() and exit status of
the process is cleansed with WEXITSTATUS() so we cannot make it
negative X-<.

In the worst case, we may retroactively have to reserve one exit
status so that the external merge driver can actively say "I give
up" to cause LL_MERGE_ERROR to be returned from the codepath, but I
wonder if it is safe to abuse "exit due to signal" (which shows up
as a return value greater than 128) as such a "merge driver went
away without leaving a useful result"?  Elijah, what do you think?

Stepping back a bit and even disregarding such a merge driver that
OOMs, if a long-running merge driver is killed, by definition we
cannot trust what the driver left on the filesystem, so handling
"exit due to signal" case differently does sound like a sensible
thing to do, at least to me, offhand.

And once we have such an enhancement to the ll-ext-merge interface,
a merge driver that voluntarily "gives up" can send a signal to kill
itself (or call abort(3)).

With a tentative commit log message (which would need to be updated
to mention what the triggering topic was that led to this
enhancement) but without associated documentation update and test,
here is to summarize and illustrate the above idea.

----- >8 ---------- >8 ---------- >8 -----
ll-merge: external merge driver died with a signal causes an error

When an external merge driver dies with a signal, we should not
expect that the result left on the filesystem is in any useful
state.  However, because the current code uses the return value from
run_command() and declares any positive value as a sign that the
driver successfully left conflicts in the result, and because the
return value from run_command() for a subprocess that died upon a
signal is positive, we end up treating whatever garbage left on the
filesystem as the result the merge driver wanted to leave us.

run_command() returns larger than 128 (WTERMSIG(status) + 128, to be
exact) when it notices that the subprocess died with a signal, so
detect such a case and return LL_MERGE_ERROR from ll_ext_merge().

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 ll-merge.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git c/ll-merge.c w/ll-merge.c
index 07ec16e8e5..5599f55ffc 100644
--- c/ll-merge.c
+++ w/ll-merge.c
@@ -243,7 +243,14 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
 		unlink_or_warn(temp[i]);
 	strbuf_release(&cmd);
 	strbuf_release(&path_sq);
-	ret = (status > 0) ? LL_MERGE_CONFLICT : status;
+
+	if (!status)
+		ret = LL_MERGE_OK;
+	else if (status <= 128)
+		ret = LL_MERGE_CONFLICT;
+	else
+		/* died due to a signal: WTERMSIG(status) + 128 */
+		ret = LL_MERGE_ERROR;
 	return ret;
 }