On Fri, Jan 28, 2022 at 8:55 AM Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > Hi Elijah, > > On Sat, 22 Jan 2022, Elijah Newren via GitGitGadget wrote: > > > From: Elijah Newren <newren@xxxxxxxxx> > > > > After a merge, this function allows the user to extract the same > > information that would be printed by `ls-files -u` -- conflicted > > files with their mode, oid, and stage. > > Hmm. Okay. > > I am not really a fan of that output where we use a variable > number of lines per file. As in: in the regular case, where a file was > modified in divergent ways, we get three lines. But if it was deleted in > one branch, or if it was created in both branches, we have only two lines. > > Frankly, I'd much rather have 3 lines for each and every conflict. Out of curiosity, does this also mean you feel like `git ls-files -u` is mis-designed (even if we're stuck with it for backward compatibility reasons)? Anyway, if we make this change guaranteeing there are 3 lines for each and every conflict, that probably implies we can remove the stage field (just letting it be implicit), right? And it also implies that the "path" field of the lines is now duplicate information. Should the path be printed on each line regardless of the duplication? Should the path be printed separately on its own line, followed by the three lines of (mode, oid) pairs? Or should the format be something else entirely? > Granted, we currently only have three stages... I thought that was true too, but Junio informed me otherwise[1]. Granted, ort does not currently make use of that, but it could. Not sure I want to rule it out. [1] https://lore.kernel.org/git/xmqqr1t89gi8.fsf@xxxxxxxxxxxxxxxxxxxxxx/ > ...and we can pretty much > guarantee that at least two of these stages are non-empty (otherwise where > would be the conflict?). No, that's not true. See the paragraph about "conflict types" from the final patch in this series, where I mentioned three different types of conflicts that can result in a path having a single higher order stage. > Meaning: Even if stage 3 is missing from the first conflict and stage 1 is > missing from the second conflict, in the output we would see stages 1, 2, > 2, 3, i.e. a duplicate stage 2, signifying that we're talking about two > different conflicts. I don't understand why you're fixating on the stage here. Why would you want to group all the stage 2s together, count them up, and then determine there are N conflicting files because there are N stage 2's? In such a design, you'd have to do the extra post-processing to recognize the all-zero modes and hashes and toss them. To me, that seems like more work than just handling the `ls-files -u` style of output, and doesn't have the advantage of showing things in a format users may already be familiar with (see above about dropping stages and maybe also pathname). Since the `ls-files -u` output is simply several lines of: <mode> <hash> <stage> <filename> You can get the number of conflicted files by counting the number of *unique* filenames. If you want to see which lines are about the same file, get all the lines with the same filename -- they are sorted by (<filename>, <stage>) for convenience, much like `ls-files -u` is. On a different angle, I'm also slightly worried there's an embedded assumption here, one that I tried to address in the "Mistake to avoid" section I added in my last patch of this series. What if you have a conflicting merge and 0 lines of output in the conflicting info section? Is there something about the reason you're asking for three lines of output per conflicted file which is going to make you overlook that particular possibility and not handle it? > But still. It makes me uneasy to have that much variability in > machine-consumable output. > > And who knows, maybe if we're ever implementing more sophisticated merge > strategies for octopus merges, we might end up with more stages... Um, if we need to handle more stages, wouldn't that undercut your request for exactly 3 stages? Wouldn't it suggest that we would indeed want to have a flexible number of lines? > In other words... > > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > > --- > > merge-ort.c | 31 +++++++++++++++++++++++++++++++ > > merge-ort.h | 21 +++++++++++++++++++++ > > 2 files changed, 52 insertions(+) > > > > diff --git a/merge-ort.c b/merge-ort.c > > index b78dde55ad9..5e7cea6cc8f 100644 > > --- a/merge-ort.c > > +++ b/merge-ort.c > > @@ -4275,6 +4275,37 @@ void merge_display_update_messages(struct merge_options *opt, > > trace2_region_leave("merge", "display messages", opt->repo); > > } > > > > +void merge_get_conflicted_files(struct merge_result *result, > > + struct string_list *conflicted_files) > > +{ > > + struct hashmap_iter iter; > > + struct strmap_entry *e; > > + struct merge_options_internal *opti = result->priv; > > + > > + strmap_for_each_entry(&opti->conflicted, &iter, e) { > > + const char *path = e->key; > > + struct conflict_info *ci = e->value; > > + int i; > > + > > + VERIFY_CI(ci); > > + > > + for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) { > > + struct stage_info *si; > > + > > + if (!(ci->filemask & (1ul << i))) > > + continue; > > + > > + si = xmalloc(sizeof(*si)); > > + si->stage = i+1; > > + si->mode = ci->stages[i].mode; > > + oidcpy(&si->oid, &ci->stages[i].oid); > > + string_list_append(conflicted_files, path)->util = si; > > + } > > + } > > + /* string_list_sort() uses a stable sort, so we're good */ > > + string_list_sort(conflicted_files); > > +} > > + > > void merge_switch_to_result(struct merge_options *opt, > > struct tree *head, > > struct merge_result *result, > > diff --git a/merge-ort.h b/merge-ort.h > > index d643b47cb7c..e635a294ea8 100644 > > --- a/merge-ort.h > > +++ b/merge-ort.h > > @@ -2,6 +2,7 @@ > > #define MERGE_ORT_H > > > > #include "merge-recursive.h" > > +#include "hash.h" > > > > struct commit; > > struct tree; > > @@ -89,6 +90,26 @@ void merge_display_update_messages(struct merge_options *opt, > > struct merge_result *result, > > FILE *stream); > > > > +struct stage_info { > > + struct object_id oid; > > + int mode; > > + int stage; > > +}; > > ... I'd rather not tack this onto a `string_list` but instead have > something like: > > struct conflict_info { Nit: "conflict_info" is already taken for another structure; we'd need a different name. But it does illustrate the idea just fine. > struct { > struct object_id oid; > int mode; > const char *path; > } stages[3] > }; > > where `path` can be `NULL` to indicate that this stage is missing. I'll get back to the data structure in a second, but the note about path being `NULL` is interesting. I'm presuming that oid and mode are all-zeros as well, yes? Now, your original request was that you wanted a line printed for all three stages. If we're printing oid, mode, and path for each stage, what do we print for the path one these lines? Is it (1) "(null)", (2) "", or (3) the real pathname (via carefully comparing to surrounding stages to find out what it is)? If we're changing the format to only print the name of the path once, does the code need to loop over the list of conflicts in order to find out the name (since the first or second stage might have a NULL path field)? In regards to the overall data structure and your comment about string_list: I'm slightly confused. You say you want to avoid a string_list, but you've only accounted for there being one conflict in this data structure. I don't see how this suggestion avoids the need for an array or a string_list or some kind of container to hold multiple of these. The inclusion of path within the stages array is also interesting -- should there be (up to) three copies of the pathname allocated per conflict? Seems a bit wasteful. Wouldn't it make more sense to have something like struct conflicts { struct { struct object_id oid; unsigned short mode; } stages[3] }; and then have a string_list storing the pathname -> conflicts mappings to avoid the pathname duplication? Or is there something you really find problematic about the string_list and you really want an array or other data structure?