Re: [PATCH] merge_blobs: use strbuf instead of manually-sized mmfile_t

Jeff King <peff@xxxxxxxx> · Tue, 16 Feb 2016 15:35:26 -0500

On Tue, Feb 16, 2016 at 01:14:09PM +0100, Stefan Frühwirth wrote:

> On 2016-02-16 at 06:50, Jeff King wrote:
> >Yeah, maybe. There were two reasons I avoided adding a test.
> >
> >One, I secretly hoped that by dragging my feet we could get consensus on
> >just ripping out merge-tree entirely. ;)
> 
> Since I cannot comment on the code quality or maturity of merge-tree, but
> would appreciate if this tool worked as expected, let me quote Chris'
> comment[1] on why he uses merge-tree instead of just "git merge", hoping
> that it has an impact upon your decision whether to keep the code or
> "ripping it out":
> 
> "One might ask, why not use the porcelain 'git merge' command? One
> reason is, in the context of the two phase commit protocol, we'd rather
> do pretty much everything we possibly can in the voting stage, leaving
> ourselves with nothing to do in the finish phase except updating the
> head to the commit we just created, and possibly updating the working
> directory--operations that are guaranteed to work. Since 'git merge'
> will update the head, we'd prefer to do it during the final phase of the
> commit, but we can't guarantee it will work. There is not a convenient
> way to do a merge dry run during the voting phase. Although I can
> conceive of ways to do the merge during the voting phase and roll back
> to the previous head if we need to, that feels a little riskier. Doing
> the merge ourselves, here, also frees us from having to work with a
> working directory, required by the porcelain 'git merge' command. This
> means we can use bare repositories and/or have transactions that use
> a head other than the repositories 'current' head."

Yeah, I agree there isn't a great solution in git here. Using "git
merge" is definitely wrong if you don't want to touch HEAD or have a
working directory.  If you _just_ care about doing the tree-level merge
without content-level merging inside blobs, you can do it in the index
like:

  export GIT_INDEX_FILE=temp.index
  base=$(git merge-base $ours $theirs)
  git read-tree -i -m --aggressive $base $ours $theirs

If you want to do content-level resolving on top of that, you can do it
with:

  git merge-index git-merge-one-file -a

though it will need a temp directory to write out conflicts and
resolved content.

That was what powered GitHub's "merge" button for many years, though we
recently switched to using libgit2's merge (mostly because the
merge-one-file bit can be slow; we didn't care about seeing the
conflicts, and as a shell script it runs O(# of merged files)
processes).

I don't think merge-tree is at all the wrong tool, in the sense that it
is being used as designed. But it is using merge code that is different
from literally the rest of git. That means you're going to run into
weird bugs (like this one), and places where it does not behave the
same.  This add/add case, for instance, is usually a conflict in a
normal git merge (try your test case with "git merge"), but merge-tree
tries to do a partial content merge with a base that never actually
existed[1].

So I worry that merge-tree's existence is a disservice to people like
Chris, because there is no disclaimer that it is effectively
unmaintained.

> >Two, I'm not sure what the test output _should_ be. I think this case is
> >buggy. So we can check that we don't corrupt the heap, which is nice,
> 
> What do you mean exactly by "this case is buggy"? Doesn't make sense to me.

Actually, I'm not sure now.  Look at the output of git-merge-tree on
your test case, once my patch is applied[2]:

  $ git merge-tree HEAD~1 master other
  added in both
    our    100644 9fe188c32cdde9723a119c69548e3768882827d1 b
    their  100644 2e65efe2a145dda7ee51d1741299f848e5bf752e b
  @@ -1,2 +1,5 @@
  +<<<<<<< .our

  +=======
  +>>>>>>> .their
   a
  \ No newline at end of file

We blindly stick the "No newline at end of file" marker into the base
file content (which is what caused the overflow in the first place). So
our three-way merge is proceeding from a bogus state that has that extra
marker in it.  I _thought_ that was what I was seeing in the output
above.

But I think in practice it magically works, because it looks like both
sides remove that (so the correct merge resolution is to drop it, as
well). And the output above is because it shows the file content as a
diff against the "ours" version (you can tell from the hunk header that
the "No newline" marker is not part of the diff content).

So merge-blobs.c:generate_common_file() is definitely buggy, but I think
the bug gets unintentionally canceled out by the follow-on three-way
merge. Which is...good, I guess?

-Peff

[1] So one obvious alternative to my patch (besides killing off
    merge-tree entirely) would be to rip out the weird add/add handling.
    That would bring merge-tree in line with the rest of git, and would
    eliminate the buffer-overflowing code entirely.

[2] You can trigger it without my patch, too, but you need input files
    which differ by more than 29 bytes.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html