Re: [PATCH v2 11/13] merge-tree: provide easy access to `ls-files -u` style info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 2, 2022 at 5:22 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
> On Wed, Feb 02 2022, Elijah Newren wrote:
>
> > On Wed, Feb 2, 2022 at 1:32 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >>
> >> "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> >>
> >> > @@ -450,7 +451,11 @@ static int real_merge(struct merge_tree_options *o,
> >> >               merge_get_conflicted_files(&result, &conflicted_files);
> >> >               for (i = 0; i < conflicted_files.nr; i++) {
> >> >                       const char *name = conflicted_files.items[i].string;
> >> > -                     if (last && !strcmp(last, name))
> >> > +                     struct stage_info *c = conflicted_files.items[i].util;
> >> > +                     if (!o->exclude_modes_oids_stages)
> >> > +                             printf("%06o %s %d\t",
> >> > +                                    c->mode, oid_to_hex(&c->oid), c->stage);
> >> > +                     else if (last && !strcmp(last, name))
> >> >                               continue;
> >> >                       write_name_quoted_relative(
> >> >                               name, prefix, stdout, line_termination);
> >>
> >> OK.  The addition (and disabling of the deduping) is quite trivial.
> >> We do not even have to worry about line termination since the extra
> >> pieces of info are prepended to the pathname.  Nice.
> >>
> >> > @@ -485,6 +490,10 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
> >> >                           N_("do a trivial merge only"), 't'),
> >> >               OPT_BOOL(0, "messages", &o.show_messages,
> >> >                        N_("also show informational/conflict messages")),
> >> > +             OPT_BOOL_F('l', "exclude-modes-oids-stages",
> >> > +                        &o.exclude_modes_oids_stages,
> >> > +                        N_("list conflicted files without modes/oids/stages"),
> >> > +                        PARSE_OPT_NONEG),
> >>
> >> Why does "-l" give shorter output than without it?  "-l" strongly
> >> hints a longer output than without, at least to me.  Just wondering
> >> if this will not become a source of confusion to future scripting
> >> users.
> >
> > Here's another example where I was struggling with naming.  Something
> > like ls-tree's `--name-only` would have been nice, but I was worried
> > it'd be confusing since it only affected the conflicted info section
> > and does not suppress the printing of the toplevel tree or the
> > informational messages sections.  And the name
> > --exclude-modes-oids-stages was long enough that I wanted a short flag
> > for it, and just used the first letter of the description ("list
> > conflicted files...").  I'm happy to change either the long or the
> > short name for this flag if anyone has suggestions.
>
> There's always sidestepping it by replacing it with a --format :)

Another solution that occurred to me, and I was _really_ close to
doing it for v3, was to just flat drop this patch entirely and not
include any such option.  But...

  * "Which files had conflicts?" seems like such an obvious question
  * I've used `git ls-files -u | awk {print\$4} | uniq` a lot in the
past after `git merge` (Or `git rebase`) to get this info (yeah, it
turns out `git diff --name-only --diff-filter=U` is 4 fewer
characters)
  * "display the list of files where conflicts were present in the web
UI" was listed as an early usecase[1]

[1] https://lore.kernel.org/git/YYlqpuzv+bmZaFzz@nand.local/

So it seemed like making that question easy to answer was worthwhile.

> Anyway, I'd mentioned that in an earlier review in
> <220124.864k5tigto.gmgdl@xxxxxxxxxxxxxxxxxxx>. FWIW here's an experiment
> to do that that I polished up (mostly copied from the ls-tree WIP code
> I'd written already).
>
> I don't know if it will ever be useful, or if you think it's
> worthwhile/simpler, but in either case I think in doing this I spotted
> the following issues or otherwise noted inconsistencies in the pre-image:
>
>    The docs say that "<stage> <path>" is SP-separated, but it's
>    actually TAB-separated, the rest is SP-separated.

Yeah, good catch.  However, it doesn't actually say they are
SP-separated; it's ambiguous about the spacing.  Which probably isn't
a good thing, but it was kind of copied from the ls-files manual:

"""
       git ls-files just outputs the filenames unless --stage is specified in
       which case it outputs:

           [<tag> ]<mode> <object> <stage> <file>
"""

(which also uses a tab between <stage> and <file> and a space
otherwise, but the output above may lead you to believe otherwise.)

>  * That you de-dupe --exclude-modes-oids-stages is a bit of a hidden feature,
>    but argubly initiative. Should it by optional? In any case my formatting
>    experiment makes it optional, since it then needs to be generalized to de-dupe
>    after we've formatted.

I think without de-duping the flag isn't helpful enough to bother
implementing.  Requiring two flags also seems painful, given the
common case scenario.

I hope I'm not coming across as dismissive.  I think eventually adding
a --format and --dedupe (the combination of which might be implied by
whatever flag is used now) might be useful additions.  Maybe --abbrev
too...eventually.  But I'm worried that it's distracting from focusing
on usecases.  In particular, I'm worried it leads to "well, script
writers technically can get what they want because we provided
everything" rather than focusing on making the most common things easy
to get, and then extending the command for flexibility as needed
later.

I'd really rather that early versions _just_ focus on actual usecases
as far as UI is concerned (and thus I was really happy to see Dscho
and Taylor concentrate on that side; I think Christian might have been
talking about that angle some but it was hard to differentiate from
the "merge-tree on steroids" spitballing).  While I want to be careful
to avoid preventing UI flexibility, I think building it in from the
beginning tends to lead to a design that is less usable.  (e.g. the
possible loss of de-duping that would naturally have arisen from
looking at things from the other angle.)  It's just a bias I have.

>  * Perhaps we should support --abbrev as ls-tree does? The below diff shows
>    it's easy enough.

This one is less problematic to me, but I'd still rather that the UI
side of things focused on the usecases for early versions.

>  * The dance you have with sed-ing out the hash in the tests could be made much
>    easier with "sed 1d <out >actual" and --no-messages for some existing tests.

Ignoring the first line is semantically different than verifying it
looks like a hash.  It also only works on the first line, and hashes
appear in multiple places, so you'd need a variety of different sed
commands for different parts of the output, which doesn't seem any
easier at all to me; I think using the same replacement everywhere is
simpler.  But perhaps I should turn it into a shell function that I
use in each case.

> diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
> index 6a2ed475106..e906d1dc9bf 100644
> --- a/Documentation/git-merge-tree.txt
> +++ b/Documentation/git-merge-tree.txt
> @@ -44,10 +44,9 @@ OPTIONS
>         newline.  Also begin the messages section with a NUL character
>         instead of a newline.  See OUTPUT below for more information.
>
> ---exclude-oids-and-modes::
> -       Instead of writing a list of (mode, oid, stage, path) tuples
> -       to output for conflicted files, just provide a list of
> -       filenames with conflicts.
> +--conflict-format::
> +       Override the default "%(objectmode) %(objectname)
> +       %(stage)%x09%(path)" format.
>
>  --[no-]messages::
>         Write any informational messages such as "Auto-merging <path>"
> @@ -89,13 +88,13 @@ Conflicted file info
>
>  This is a sequence of lines with the format
>
> -       <mode> <object> <stage> <filename>
> +       %(objectmode) %(objectname) %(stage)%x09%(path)
>
>  The filename will be quoted as explained for the configuration
> -variable `core.quotePath` (see linkgit:git-config[1]).  However, if
> -the `--exclude-oids-and-modes` option is passed, the mode, object, and
> -stage will be omitted.  If `-z` is passed, the "lines" are terminated
> -by a NUL character instead of a newline character.
> +variable `core.quotePath` (see linkgit:git-config[1]).
> +
> +If `-z` is passed, the "lines" are terminated by a NUL character
> +instead of a newline character.
>
>  Informational messages
>  ~~~~~~~~~~~~~~~~~~~~~~
> diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
> index 58c0ddc5a32..14fed95a8ce 100644
> --- a/builtin/merge-tree.c
> +++ b/builtin/merge-tree.c
> @@ -395,9 +395,64 @@ struct merge_tree_options {
>         int mode;
>         int allow_unrelated_histories;
>         int show_messages;
> -       int exclude_modes_oids_stages;
> +       const char *conflict_format;
> +       int unique_conflicts;
> +       int abbrev;
>  };
>
> +struct expand_conflict_data {
> +       const char *prefix;
> +       struct string_list_item *item;
> +       struct strbuf *scratch;
> +       int abbrev;
> +       struct strbuf *sb_tmp;
> +};
> +static size_t expand_conflict_format(struct strbuf *sb,
> +                                    const char *start,
> +                                    void *context)
> +{
> +       struct expand_conflict_data *data = context;
> +       struct string_list_item *item = data->item;
> +       struct stage_info *info = item->util;
> +       const char *end;
> +       const char *p;
> +       size_t len;
> +
> +       len = strbuf_expand_literal_cb(sb, start, NULL);
> +       if (len)
> +               return len;
> +
> +       if (*start != '(')
> +               die(_("bad format as of '%s'"), start);
> +       end = strchr(start + 1, ')');
> +       if (!end)
> +               die(_("format element '%s' does not end in ')'"), start);
> +       len = end - start + 1;
> +
> +       if (skip_prefix(start, "(objectmode)", &p)) {
> +               strbuf_addf(sb, "%06o", info->mode);
> +       } else if (skip_prefix(start, "(objectname)", &p)) {
> +               strbuf_addstr(sb, find_unique_abbrev(&info->oid, data->abbrev));
> +       } else if (skip_prefix(start, "(stage)", &p)) {
> +               strbuf_addf(sb, "%d", info->stage);
> +       } else if (skip_prefix(start, "(path)", &p)) {
> +               const char *name = item->string;
> +
> +               if (data->prefix)
> +                       name = relative_path(name, data->prefix, data->scratch);
> +               strbuf_addstr(sb, name);
> +
> +               strbuf_reset(data->sb_tmp);
> +               /* The relative_path() function resets "scratch" */
> +
> +       } else {
> +               unsigned int errlen = (unsigned long)len;
> +               die(_("bad format specifier %%%.*s"), errlen, start);
> +       }
> +
> +       return len;
> +}
> +
>  static int real_merge(struct merge_tree_options *o,
>                       const char *branch1, const char *branch2,
>                       const char *prefix)
> @@ -446,23 +501,43 @@ static int real_merge(struct merge_tree_options *o,
>         puts(oid_to_hex(&result.tree->object.oid));
>         if (!result.clean) {
>                 struct string_list conflicted_files = STRING_LIST_INIT_NODUP;
> -               const char *last = NULL;
> -               int i;
> +               struct string_list_item *item;
> +               char *last = NULL;
> +               struct strbuf sb = STRBUF_INIT;
> +               struct strbuf tmp = STRBUF_INIT;
>
>                 merge_get_conflicted_files(&result, &conflicted_files);
> -               for (i = 0; i < conflicted_files.nr; i++) {
> -                       const char *name = conflicted_files.items[i].string;
> -                       struct stage_info *c = conflicted_files.items[i].util;
> -                       if (!o->exclude_modes_oids_stages)
> -                               printf("%06o %s %d\t",
> -                                      c->mode, oid_to_hex(&c->oid), c->stage);
> -                       else if (last && !strcmp(last, name))
> +               for_each_string_list_item(item, &conflicted_files) {
> +                       struct expand_conflict_data ctx = {
> +                               .prefix = prefix,
> +                               .item = item,
> +                               .abbrev = o->abbrev,
> +                               .scratch = &sb,
> +                               .sb_tmp = &tmp,
> +                       };
> +
> +                       strbuf_expand(&sb, o->conflict_format, expand_conflict_format, &ctx);
> +                       strbuf_addch(&sb, line_termination);
> +
> +                       if (o->unique_conflicts && last && !strcmp(last, sb.buf)) {
> +                               free(last);
> +                               last = strbuf_detach(&sb, NULL);
>                                 continue;
> -                       write_name_quoted_relative(
> -                               name, prefix, stdout, line_termination);
> -                       last = name;
> +                       }
> +
> +                       fwrite(sb.buf, sb.len, 1, stdout);
> +
> +                       if (o->unique_conflicts) {
> +                               free(last);
> +                               last = strbuf_detach(&sb, NULL);
> +                       } else {
> +                               strbuf_reset(&sb);
> +                       }
>                 }
>                 string_list_clear(&conflicted_files, 1);
> +               strbuf_release(&sb);
> +               strbuf_release(&tmp);
> +               free(last);
>         }
>         if (o->show_messages) {
>                 putchar(line_termination);
> @@ -474,7 +549,11 @@ static int real_merge(struct merge_tree_options *o,
>
>  int cmd_merge_tree(int argc, const char **argv, const char *prefix)
>  {
> -       struct merge_tree_options o = { .show_messages = -1 };
> +       struct merge_tree_options o = {
> +               .show_messages = -1,
> +               .conflict_format = "%(objectmode) %(objectname) %(stage)%x09%(path)",
> +               .unique_conflicts = 1,
> +       };
>         int expected_remaining_argc;
>         int original_argc;
>
> @@ -493,14 +572,15 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
>                          N_("also show informational/conflict messages")),
>                 OPT_SET_INT('z', NULL, &line_termination,
>                             N_("separate paths with the NUL character"), '\0'),
> -               OPT_BOOL_F('l', "exclude-modes-oids-stages",
> -                          &o.exclude_modes_oids_stages,
> -                          N_("list conflicted files without modes/oids/stages"),
> -                          PARSE_OPT_NONEG),
> +               OPT_STRING(0, "conflict-format", &o.conflict_format, N_("format"),
> +                          N_("specify a custom format to use for conflicted files")),
> +               OPT_BOOL(0, "unique-conflicts", &o.unique_conflicts,
> +                        N_("omit duplicate --conflict-format lines")),

The latter of which you didn't include in the manual?  Also,
unique_conflicts seems like something that is trivial to understand
from the coding perspective, but probably require quite a bit more
explanation from the manual.  For example, if objectname is included
in the format, unique-conflicts is essentially a no-op.  And that's
the default...so, you'd probably have to spend time in the manual
explaining under what circumstances it's useful.  I'm also not sure if
a user who wanted (mode, path) would want unique_conflicts to default
to 1; it may be something only meaningful in the particular case of
"just give me conflicted filenames".

>                 OPT_BOOL_F(0, "allow-unrelated-histories",
>                            &o.allow_unrelated_histories,
>                            N_("allow merging unrelated histories"),
>                            PARSE_OPT_NONEG),
> +               OPT__ABBREV(&o.abbrev),
>                 OPT_END()
>         };
>
> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> index 4de089d976d..e6354b2d284 100755
> --- a/t/t4301-merge-tree-write-tree.sh
> +++ b/t/t4301-merge-tree-write-tree.sh
> @@ -93,7 +93,7 @@ test_expect_success 'Barf on too many arguments' '
>  '
>
>  test_expect_success 'test conflict notices and such' '
> -       test_expect_code 1 git merge-tree --write-tree --exclude-modes-oids-stages side1 side2 >out &&
> +       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" side1 side2 >out &&
>         sed -e "s/[0-9a-f]\{40,\}/HASH/g" out >actual &&
>
>         # Expected results:
> @@ -115,8 +115,35 @@ test_expect_success 'test conflict notices and such' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'merge-tree --unique-conflicts is the default' '
> +       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" --no-messages side1 side2 >out &&
> +       sed 1d <out >actual &&
> +       cat >expect <<-\EOF &&
> +       greeting
> +       whatever~side1
> +       EOF
> +       test_cmp expect actual &&
> +
> +       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" --no-messages side1 side2 >out2 &&
> +       sed 1d <out2 >actual2 &&
> +       test_cmp actual actual2
> +'
> +
> +test_expect_success 'merge-tree --no-unique-conflicts' '
> +       test_expect_code 1 git merge-tree --write-tree --conflict-format="%(path)" --no-unique-conflicts --no-messages side1 side2 >out &&
> +       sed 1d <out >actual &&
> +       cat >expect <<-\EOF &&
> +       greeting
> +       greeting
> +       greeting
> +       whatever~side1
> +       whatever~side1
> +       EOF
> +       test_cmp expect actual
> +'
> +
>  test_expect_success 'Just the conflicted files without the messages' '
> -       test_expect_code 1 git merge-tree --write-tree --no-messages --exclude-modes-oids-stages side1 side2 >out &&
> +       test_expect_code 1 git merge-tree --write-tree --no-messages --conflict-format="%(path)" side1 side2 >out &&
>         sed -e "s/[0-9a-f]\{40,\}/HASH/g" out >actual &&
>
>         test_write_lines HASH greeting whatever~side1 >expect &&




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux