Re: [PATCH v2] merge-tree: fix segmentation fault in read-only repositories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Elijah,

now that I have studied more of the code, hunted down Coverity reports
about `merge-ort.c` and `builtin/merge-tree.c` and essentially have grown
a lot more confidence about my patch, I'll take the time to respond in a
bit more detail.

On Wed, 21 Sep 2022, Elijah Newren wrote:

> On Wed, Sep 21, 2022 at 3:08 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@xxxxxxxxx> wrote:
> >
> > From: Johannes Schindelin <johannes.schindelin@xxxxxx>
> >
> > Independent of the question whether we want `git merge-tree` to report
> > the tree name even when it failed to write the tree objects in a
> > read-only repository, there is no question that we should avoid a
> > segmentation fault.
>
> Ah, a read-only repo.  Looks like we didn't bother to check the return
> status of write_object_file() in one of the two cases; oops.  (And it
> looks like the other case does not correctly propagate the error
> upwards far enough...)

Indeed, that was the actual root cause, but I had failed to even look
whether the return values of `write_tree()` and `write_object_file()` were
respected. Narrator's voice: they weren't.

> > And when we report an invalid tree name (because the tree could not be
> > written), we need the exit status to be non-zero.
> >
> > Let's make it so.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> > ---
> >     merge-tree: fix segmentation fault in read-only repositories
> >
> >     Turns out that the segmentation fault reported by Taylor
> >     [https://lore.kernel.org/git/YyopQD+LvPucnz3w@nand.local/] happened
> >     while testing merge-ort in a read-only repository, and that the upstream
> >     version of git merge-tree is as affected as GitHub's internal version.
> >
> >     Note: I briefly considered using the OID of the_hash_algo->empty_tree
> >     instead of null_oid() when no tree object could be constructed. However,
> >     I have come to the conclusion that this could potentially cause its own
> >     set of problems because it would relate to a valid tree object even if
> >     we do not have any valid tree object to play with.
> >
> >     Also note: The question I hinted at in the commit message, namely
> >     whether or not to try harder to construct a tree object even if we
> >     cannot write it out, maybe merits a longer discussion, one that I think
> >     we should have after v2.38.0 is released, so as not to distract from
> >     focusing on v2.38.0.
>
> That's fair, but you know I'm going to have a hard time refraining
> from commenting on it anyway...  :-)

Hahaha, of course you would ;-)

> [...]
> > @@ -446,7 +447,8 @@ static int real_merge(struct merge_tree_options *o,
> >         if (o->show_messages == -1)
> >                 o->show_messages = !result.clean;
> >
> > -       printf("%s%c", oid_to_hex(&result.tree->object.oid), line_termination);
> > +       tree_oid = result.tree ? &result.tree->object.oid : null_oid();
> > +       printf("%s%c", oid_to_hex(tree_oid), line_termination);
>
> Perhaps also print a warning to stderr when result.tree is NULL?

I ended up not even touching `builtin/merge-tree.c` but instead ensuring
that `result.clean` is negative if we fail to write an object (which could
happen even in read/write repositories, think "disk full").

As a consequence, we do not even reach this `printf()` anymore, as you
pointed out, a negative `result.clean` is handled much earlier.

It is handled via a `die()` in `real_merge()`, and that will need to
change, I think, if we want to continue on the batched merges.

> >         if (!result.clean) {
> >                 struct string_list conflicted_files = STRING_LIST_INIT_NODUP;
> >                 const char *last = NULL;
> > @@ -473,7 +475,7 @@ static int real_merge(struct merge_tree_options *o,
> >                                               &result);
> >         }
> >         merge_finalize(&opt, &result);
> > -       return !result.clean; /* result.clean < 0 handled above */
> > +       return !result.tree || !result.clean; /* result.clean < 0 handled above */
>
> Thinking out loud, should this logic be at the merge-ort.c level,
> perhaps something like this:
>
> @@ -4940,6 +4941,9 @@ static void
> merge_ort_nonrecursive_internal(struct merge_options *opt,
>         result->tree = parse_tree_indirect(&working_tree_oid);
>         /* existence of conflicted entries implies unclean */
>         result->clean &= strmap_empty(&opt->priv->conflicted);
> +       if (!result->tree)
> +               /* This shouldn't happen, because if we did fail to
> write a tree we should have returned early before getting here.  But
> just in case we have some bugs... */
> +               result->clean = -1;
>         if (!opt->priv->call_depth) {
>                 result->priv = opt->priv;
>                 result->_properly_initialized = RESULT_INITIALIZED;
>
> That might benefit callers other than merge-tree, though maybe it
> makes it harder to print a helpful error message (unless we're fine
> with the library always throwing one?)

The error messages are already thrown about (this is how it looks like
with v3 of this patch):

	[...]
	+ git -C read-only merge-tree side1 side2
	error: insufficient permission for adding an object to repository database ./objects
	error: error: Unable to add numbers to database
	error: insufficient permission for adding an object to repository database ./objects
	error: error: Unable to add greeting to database
	error: insufficient permission for adding an object to repository database ./objects
	fatal: failure to merge
	+ exit_code=128
	[...]

> > diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> > index 28ca5c38bb5..013b77144bd 100755
> > --- a/t/t4301-merge-tree-write-tree.sh
> > +++ b/t/t4301-merge-tree-write-tree.sh
> > @@ -810,4 +810,13 @@ test_expect_success 'can override merge of unrelated histories' '
> >         test_cmp expect actual
> >  '
> >
> > +test_expect_success SANITY 'merge-ort fails gracefully in a read-only repository' '
> > +       git init --bare read-only &&
> > +       git push read-only side1 side2 side3 &&
> > +       test_when_finished "chmod -R u+w read-only" &&
> > +       chmod -R a-w read-only &&
> > +       test_must_fail git -C read-only merge-tree side1 side3 &&
> > +       test_must_fail git -C read-only merge-tree side1 side2
> > +'
> > +
> >  test_done
> >
> > base-commit: dda7228a83e2e9ff584bf6adbf55910565b41e14
> > --
> > gitgitgadget
>
> This is a reasonable workaround from the merge-tree side, if we expect
> merge-ort to not correctly notify us of issues (which may be fair
> given that a quick glance suggests we have more than one problematic
> codepath, which I'll discuss more below).  However, I do think
> merge-ort should be returning a negative value for the "clean" status
> in such a case.

You're absolutely right. That's what v3 now does.

> If merge-ort did that, then we wouldn't even get to your new code
> because merge-tree.c already checks for that:
>
>     merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
>     if (result.clean < 0)
>         die(_("failure to merge"));
>
> I have a hack above which sets a negative return value, but it's only
> a workaround since it comes from a late detect-after-the-fact
> location.  Here's another ugly hack, but one which highlights exactly
> where the real problem occurs:
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 99dcee2db8..c2144e9220 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3605,7 +3605,8 @@ static void write_tree(struct object_id *result_oid,
>         }
>
>         /* Write this object file out, and record in result_oid */
> -       write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid);
> +       if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid))
> +               die(_("Unable to add new tree to database"));
>         strbuf_release(&buf);
>  }
>
> die()ing, of course, is not a good choice, we should really return -1
> instead.  Unfortunately, the callers aren't currently prepared to
> propagate such a value upwards (as reflected in the return type of the
> function) so any fix in this area is a little more involved than just
> modifying these few lines.  Also, there is one other call to
> write_object_file() in merge-ort.c which does check and return a value
> of -1, though looking around it appears the callers of that other site
> do not always correctly check and propagate a return value of -1
> further upwards as they should, meaning we are losing track of the
> fact that the merge failed to function.
>
> Given the lack of correct propagation of -1 return values in the other
> codepath plus the error here, keeping some form of workaround sounds
> fair (though it may be nice to print an error that something fishy
> happened).
>
> And then, possibly post-v2.38.0 though we may be able to get it in
> before, getting correct propagation of a -1 return value from the
> source of the error would be good.  Would you like to look into that,
> or would you prefer I did?

It turned out not to be too bad.

Essentially, to propagate `write_object_file()` failures I only had to
change a couple of signatures (with the corresponding `return`s):
`write_tree()`, `write_completed_directory()`, and `process_entries()`.
And then, of course, I needed to change
`merge_ort_nonrecursive_internal()` to respect the propagated error by
setting `result->clean = -1`.

Pretty straight-forward, really, much less involved than I had expected.

Thank you!
Dscho




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux