On Wed, Sep 21, 2022 at 3:08 PM Johannes Schindelin via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > From: Johannes Schindelin <johannes.schindelin@xxxxxx> > > Independent of the question whether we want `git merge-tree` to report > the tree name even when it failed to write the tree objects in a > read-only repository, there is no question that we should avoid a > segmentation fault. Ah, a read-only repo. Looks like we didn't bother to check the return status of write_object_file() in one of the two cases; oops. (And it looks like the other case does not correctly propagate the error upwards far enough...) > And when we report an invalid tree name (because the tree could not be > written), we need the exit status to be non-zero. > > Let's make it so. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > --- > merge-tree: fix segmentation fault in read-only repositories > > Turns out that the segmentation fault reported by Taylor > [https://lore.kernel.org/git/YyopQD+LvPucnz3w@nand.local/] happened > while testing merge-ort in a read-only repository, and that the upstream > version of git merge-tree is as affected as GitHub's internal version. > > Note: I briefly considered using the OID of the_hash_algo->empty_tree > instead of null_oid() when no tree object could be constructed. However, > I have come to the conclusion that this could potentially cause its own > set of problems because it would relate to a valid tree object even if > we do not have any valid tree object to play with. > > Also note: The question I hinted at in the commit message, namely > whether or not to try harder to construct a tree object even if we > cannot write it out, maybe merits a longer discussion, one that I think > we should have after v2.38.0 is released, so as not to distract from > focusing on v2.38.0. That's fair, but you know I'm going to have a hard time refraining from commenting on it anyway... :-) [...] > @@ -446,7 +447,8 @@ static int real_merge(struct merge_tree_options *o, > if (o->show_messages == -1) > o->show_messages = !result.clean; > > - printf("%s%c", oid_to_hex(&result.tree->object.oid), line_termination); > + tree_oid = result.tree ? &result.tree->object.oid : null_oid(); > + printf("%s%c", oid_to_hex(tree_oid), line_termination); Perhaps also print a warning to stderr when result.tree is NULL? > if (!result.clean) { > struct string_list conflicted_files = STRING_LIST_INIT_NODUP; > const char *last = NULL; > @@ -473,7 +475,7 @@ static int real_merge(struct merge_tree_options *o, > &result); > } > merge_finalize(&opt, &result); > - return !result.clean; /* result.clean < 0 handled above */ > + return !result.tree || !result.clean; /* result.clean < 0 handled above */ Thinking out loud, should this logic be at the merge-ort.c level, perhaps something like this: @@ -4940,6 +4941,9 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, result->tree = parse_tree_indirect(&working_tree_oid); /* existence of conflicted entries implies unclean */ result->clean &= strmap_empty(&opt->priv->conflicted); + if (!result->tree) + /* This shouldn't happen, because if we did fail to write a tree we should have returned early before getting here. But just in case we have some bugs... */ + result->clean = -1; if (!opt->priv->call_depth) { result->priv = opt->priv; result->_properly_initialized = RESULT_INITIALIZED; That might benefit callers other than merge-tree, though maybe it makes it harder to print a helpful error message (unless we're fine with the library always throwing one?) > diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh > index 28ca5c38bb5..013b77144bd 100755 > --- a/t/t4301-merge-tree-write-tree.sh > +++ b/t/t4301-merge-tree-write-tree.sh > @@ -810,4 +810,13 @@ test_expect_success 'can override merge of unrelated histories' ' > test_cmp expect actual > ' > > +test_expect_success SANITY 'merge-ort fails gracefully in a read-only repository' ' > + git init --bare read-only && > + git push read-only side1 side2 side3 && > + test_when_finished "chmod -R u+w read-only" && > + chmod -R a-w read-only && > + test_must_fail git -C read-only merge-tree side1 side3 && > + test_must_fail git -C read-only merge-tree side1 side2 > +' > + > test_done > > base-commit: dda7228a83e2e9ff584bf6adbf55910565b41e14 > -- > gitgitgadget This is a reasonable workaround from the merge-tree side, if we expect merge-ort to not correctly notify us of issues (which may be fair given that a quick glance suggests we have more than one problematic codepath, which I'll discuss more below). However, I do think merge-ort should be returning a negative value for the "clean" status in such a case. If merge-ort did that, then we wouldn't even get to your new code because merge-tree.c already checks for that: merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result); if (result.clean < 0) die(_("failure to merge")); I have a hack above which sets a negative return value, but it's only a workaround since it comes from a late detect-after-the-fact location. Here's another ugly hack, but one which highlights exactly where the real problem occurs: diff --git a/merge-ort.c b/merge-ort.c index 99dcee2db8..c2144e9220 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -3605,7 +3605,8 @@ static void write_tree(struct object_id *result_oid, } /* Write this object file out, and record in result_oid */ - write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid); + if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid)) + die(_("Unable to add new tree to database")); strbuf_release(&buf); } die()ing, of course, is not a good choice, we should really return -1 instead. Unfortunately, the callers aren't currently prepared to propagate such a value upwards (as reflected in the return type of the function) so any fix in this area is a little more involved than just modifying these few lines. Also, there is one other call to write_object_file() in merge-ort.c which does check and return a value of -1, though looking around it appears the callers of that other site do not always correctly check and propagate a return value of -1 further upwards as they should, meaning we are losing track of the fact that the merge failed to function. Given the lack of correct propagation of -1 return values in the other codepath plus the error here, keeping some form of workaround sounds fair (though it may be nice to print an error that something fishy happened). And then, possibly post-v2.38.0 though we may be able to get it in before, getting correct propagation of a -1 return value from the source of the error would be good. Would you like to look into that, or would you prefer I did?