Hi Elijah, On Wed, 5 Jan 2022, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@xxxxxxxxx> > > Callers of `git merge-tree --real` might want an easy way to determine > which files conflicted. While they could potentially use the --messages > option and parse the resulting messages written to that file, those > messages are not meant to be machine readable. Provide a simpler > mechanism of having the user specify --unmerged-list=$FILENAME, and then > write a NUL-separated list of unmerged filenames to the specified file. This patch does what the commit message says, and it looks quite plausible. However, in practice it seems that you need either a tree (if the merge succeeded) or the list of conflicted files (if the merge succeeded). So while it looks relatively clean from the implementation's point of view, the design itself could probably withstand a bit of consideration. As I hinted earlier (to be precise, in https://lore.kernel.org/git/nycvar.QRO.7.76.6.2201071602110.339@xxxxxxxxxxxxxxxxx/), I had the chance in December to work on the server-side, using `merge-ort` for a bit. In the following, I will talk about this a bit more than about this particular patch, but I think it is highly relevant (not a tangent). One of the things that became clear to me is that we really have an either/or situation here. Either the merge succeeds, and we _need_ that tree, or it fails, and we could not care less about the tree at all. In fact, if the merge fails, we completely ignore the tree, and it would be better if we would not even write out any Git objects in that case at all: even just writing the objects would be quite costly at the server-side scale. So my (somewhat hacky) patches for a proof-of-concept produced _either_ the hash of the tree on `stdout`, _or_ a header saying that there were conflicts followed by a NUL-separated list of file names. Mind you, I did not even get to the point of analyzing things even more deeply. My partner in crime and I only got to comparing the `merge-ort` way to the libgit2-based way, trying to get them to compare as much apples-to-apples as possible [*1*], and we found that even the time to spawn the Git process (~1-3ms, with all overhead counted in) is _quite_ noticeable, at server-side scale. Of course, the `merge-ort` performance was _really_ nice when doing anything remotely complex, then `merge-ort` really blew the libgit2-based merge out of the water. But that's not the common case. The common case are merges that involve very few modified files, a single merge base, and they don't conflict. And those can be processed by the libgit2-based method within a fraction of the time it takes to even only so much as spawn `git` (libgit2-based merges can complete in less than a fifth millisecond, that's at most a fifth of the time it takes to merely run `git merge-tree`). The difference between 0.2-0.5ms for libgit2-based merges on the one hand, and 1-3ms for `merge-ort`-based merges on the other hand, might not seem like much, but you have to multiply it by the times such a merge is performed on the server. Which is a _lot_. Way more often than I thought. In this particular instance, there is a silver-lining: the libgit2-based merge is not actually recursive. It is a three-way merge. Which means that we first have to determine a merge base. In our case, this is done by spawning a Git process anyway, so one of my ideas to move forward is to fold that merge-base logic into `git merge-tree`, too. Anyway, the short short is: whenever we can avoid unnecessary work, we should do so. In the context of this patch, I would say that we should avoid writing out a tree (and avoid printing its hash to `stdout`) if there are merge conflicts. And we should avoid writing (and later reading) a file, if we can get away with avoiding it. At least in the default case, that is. We still might need a flag to produce some more information about those merge conflicts. But even in that case, it would be better to have a list of file names with the three associated stages than to output the hash of a tree that contains conflicts (and tons of files _without_ conflicts). The UI needs to re-generate those conflicts anyway. And remember: a tree can contain millions of files even if there is but a single conflict. It makes more sense for `merge-tree --real` to output a concrete list of files that conflicted, rather than expecting the caller to discern between conflicts and non-conflicts by processing a tree object. Maybe you agree with this rationale and re-design the `--real` mode to try to avoid writing out files in the common case? About the form of the patch itself: I was tempted to go with the nitpicking spirit I see on the Git mailing list these days, especially about the shell script code in the test scripts. But then I realized that I find such nitpicking pretty unhelpful, myself. The code is good as-is, even if I would write it differently. It is clear, and it does exactly what it is supposed to do. Thank you, Dscho Footnote *1*: I did not _quite_ get to the point of comparing the `merge-ort` merges to the libgit2 ones, unfortunately. I was on my way to add code to respect `merge.renames = false` so that we could _truly_ compare the `merge-ort` merges to the libgit2 merges (we really will want to verify that the output is identical, before even considering to enable recursive merges on the server side, and then only after studying the time difference), and then had to take off due to the holidays. If you already have that need to be able to turn off rename-detection on your radar, even if only for a transitional period, I would be _so_ delighted.