Re: git merge-tree: bug report and some feature requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>> I'm experimenting with some new porcelain for interactive rebase. One
>> goal is to leave the work tree untouched for most operations. It looks
>> to me like 'git merge-tree' may be the right plumbing command for
>> doing the merge part of the pick work of the todo list, one commit at
>> a time. If I'm wrong about this, I'd love pointers; what follows may
>> still be interesting anyway.
>
> I don't have a concrete alternative (yet?) but here are some pointers
> to two alternate merge-without-touching-working-tree possibilities, if
> your current route doesn't pan out as well as you like:
>
> I posted some patches last year to make merge-recursive.c be able to
> do merges without touching the working tree.  Adding a few flags would
> then enable it for any of 'merge', 'cherry-pick', 'am', or
> 'rebase'...though for unsuccessful merges, there's a clear question of
> what/how conflicts should be reported to the user.  That probably
> depends a fair amount on the precise use-case.
>
> Although that series was placed on the backburner due to the immediate
> driver of the feature going away, I'm still interested in such a
> change, though I think it would fall out as a nice side effect of
> implementing Junio's proposed ideal-world-merge-recursive rewrite[1].
> I have started looking into that[2], but no guarantees about how
> quickly I'll find time to finish or even whether I will.
>
> [1] https://public-inbox.org/git/xmqqd147kpdm.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> [2] https://github.com/newren/git/blob/ort/ort-cover-letter contains
> overview of ideas and notes to myself about what I was hoping to
> accomplish; currently it doesn't even compile or do anything

Thanks for the pointer. That does seem promising.

And yes, I see now that serialization of conflicts is decidedly
challenging. More on that below.


>> 4. API suggestion
>>
>> Here's what I really want 'git merge-tree' to output. :)
> ...
>> If the merge had conflicts, write the "as merged as possible" tree to
>
> You'd need to define "as merged as possible" more carefully, because I
> thought you meant a tree containing all the three-way merge conflict
> markers and such being present in the "resolved" file, but from your
> parenthetical note below it appears you think that is a different tree
> that would also be useful to diff against the first one.  That leaves
> me wondering what the first tree is. (Is it just the tree where for
> each path, if that path had no conflicts associated with it then it's
> the merge-resolved-file, and otherwise it's the file contents from the
> merge-base?).

FWIW, the parenthetical suggestion was indeed what I had in mind. But
non-content conflicts appear to make that a non-starter. Or at least
woefully incomplete.


> Both of these trees are actually rather non-trivial to define.  The
> wording above isn't actually sufficient, because content conflicts
> aren't the only kind of conflict.  More on that below.
>
> There is already a bunch of code in merge-recursive.c to create a
> forcibly-merged-accepting-conflict-markers-in-the-resolution and
> record it as a tree (this is used for creating virtual merge bases in
> the recursive case, namely when there isn't a single merge-base for
> the two branches you are merging).  It might be reusable for what you
> want here, but it's not immediately clear whether all the things it
> does are appropriate; someone would have to consider the non-content
> (path-based) conflicts carefully.

Ack. I assume this is also the code that generates the existing 'git
merge-tree' patches, which includes conflict markers.


>> the object database and give me its sha, and then also give me the
>> three-way merge diff output for all conflicts, as a regular patch
>> against that tree, using full path names and shas. (Alternatively,
>> maybe better, give me a second sha for a tree containing all the
>> three-way merge diff patches applied, which I can diff against the
>> first tree to find the conflict patches.)
>
> As far as I can tell, you're assuming that it's possible with two
> trees that are crafted "just right", that you can tell where the merge
> conflicts are, with binary files being your only difficulty.  Content
> conflicts aren't the only type that exist; there are also path-based
> conflicts.  These type of conflicts also make it difficult to know how
> the two trees you are requesting should even be created.
>
> For example, if there is a modify/delete conflict, how can that be
> determined from just two trees?  If the first tree has the base
> version of the file, then the second tree either has a file at the
> same position or it doesn't.  Neither case looks like a conflict, but
> the original merge had one.  You need more information.  The exact
> same thing can be said for rename/delete conflicts.
>
> Similarly, rename/add (one side renames an existing file to some new
> path (say, "new_path"), and the other adds a brand new file at
> "new_path), or rename/rename(2to1) (each side renames a different file
> to the same location), won't be detectable just by diffing two trees.
> These are often handled by moving both files to some other location,
> so there's no way to record in a tree that there was a conflict.
>
> rename/rename(1to2) is similar, but instead of two different original
> files being renamed to the same thing, this is one file being renamed
> differently on different sides of history.
>
> I know that several of the examples above involved rename detection,
> which git-merge-trees won't even do, but that means you're even more
> likely to face the modify/delete conflict cases.  And our list still
> isn't done, either:
>
> Directory/file conflicts (one side puts a directory of the same name
> that the other side adds as a file) will also cause problems.

Me: If the world were simple, we could build it this simple way!

You: The world isn't simple.

Me: ...drat. Thanks.


I've been looking at libgit2's handling of this. It appears the
closest analog is:

* Call git_merge_trees:
https://libgit2.github.com/libgit2/#HEAD/group/merge/git_merge_trees
* Call git_index_conflict_iterator_new on the resulting *git_index:
https://libgit2.github.com/libgit2/#HEAD/group/index/git_index_conflict_iterator_new
* Use git_index_conflict_next to inspect a conflict:
https://libgit2.github.com/libgit2/#HEAD/group/index/git_index_conflict_next
* The conflict provides three git_index_entrys:
https://libgit2.github.com/libgit2/#HEAD/type/git_index_entry

Looking over your list above, at a minimum, libgit2 might not have a
particularly good way to represent submodule/file or
submodule/directory conflicts, because is-a-submodule is defined
external to a git_index_entry.

(Incidentally, looking at
https://github.com/git/git/blob/master/Documentation/technical/index-format.txt,
it appears that there are also possibly symlink/gitlink X
file/dir/submodule conflicts? Ugh.)

libgit2 gets to avoid the bother of serialization and deserialization,
but it seems the problem of conflict categorization still remains. See
my next comments.



> Finally, directory rename detection (currently in pu under review)
> adds a few "implicit dir rename" conflict types (renames of multiple
> directories would cause multiple files to be renamed to the same
> location, or an existing file/dir being in the way of one or more
> path(s) getting implicitly renamed).  This means that the number of
> types of non-textual conflicts might also grow in the future so it may
> be unwise to try to special case existing exceptions with a bag of
> clever workarounds.

There's tension here.

Cataloging or special-casing all possible conflict types does seem
unwise because of the sheer number of kinds of conflicts.

But the alternative appears to be punting entirely, as libgit2 does,
and merely providing something akin to three index entries. This which
leaves it unclear what exactly the conflict was, at which point any
user (read: porcelain developer) will end up having to recreate some
merge logic to figure out what went wrong. And if merge-tree starts
doing rename detection, the user might then have to emulate that as
well. Given that, you may as well catalog the many kinds of conflicts
and report which one occurred. And if there were a list of all
possible kinds of conflicts, it'd help the user write correct code,
because they're not going to naively fail to consider--as I did--the
many kinds of non-content-based conflicts.


I admit, I'm somewhat puzzled about where to go from here, both from
git/libgit2's perspective and the perspective of my immediate needs.
For myself, I probably have the luxury of bailing on anything but the
simplest, content-based conflicts and sending the (end) user back to
regular interactive rebase, if I can find a way to implement that
reliably and without inordinate difficulty. The first case I checked
doesn't look good: For dir/file merge conflicts, 'git merge-tree'
merely reports some "added in remote" and "added in local"
explanations, meaning I'd have to add path conflict detection myself.

I'll continue to ponder. Thanks for the enlightening email.

-josh


P.S. Is it expected/known that 'git merge --abort' of a
merge-in-progress involving a dir/file conflict generates a mildly
incomprehensible error in addition to aborting the merge?

$ git merge --abort
error: 'df' appears as both a file and as a directory
error: df: cannot drop to stage #0



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux