[PATCH v2 0/8] fetch: introduce machine-parseable output

Patrick Steinhardt <ps@xxxxxx> · Thu, 27 Apr 2023 13:13:04 +0200

Hi,

this is the second version of my patch series that introduces a
machine-parseable output.

Changes compared to v1:

    - Patch 3/8: I've reworded the commit message to hopefully be more
      straight-forward. I've also amended tests so that we don't only
      test with `--dry-run`, but also without.

    - Patch 6/8: I've also moved the `all` and `multiple` variables from
      their global scope into `cmd__fetch`.

    - Patch 7/8: Fixed a bug where `--output-format=` wasn't honored for
      child processes when doing multi-remote fetches. Furthermore, I've
      unified parsing of the actual format so that we don't have to
      repeat it thrice.

    - Patch 8/8: Added a note to the commit message that tries to argue
      why I didn't add remote information to the interface. I'm still
      open to change this if you disagree with my reasoning here.

Thanks a bunch for all the feedback received so far, really appreciate
it!

Patrick

Patrick Steinhardt (8):
  fetch: split out tests for output format
  fetch: add a test to exercise invalid output formats
  fetch: fix missing from-reference when fetching HEAD:foo
  fetch: introduce `display_format` enum
  fetch: move display format parsing into main function
  fetch: move option related variables into main function
  fetch: introduce new `--output-format` option
  fetch: introduce machine-parseable "porcelain" output format

 Documentation/config/fetch.txt  |   4 +-
 Documentation/fetch-options.txt |   5 +
 Documentation/git-fetch.txt     |  17 +-
 builtin/fetch.c                 | 427 ++++++++++++++++++++------------
 t/t5510-fetch.sh                |  53 ----
 t/t5574-fetch-output.sh         | 237 ++++++++++++++++++
 6 files changed, 521 insertions(+), 222 deletions(-)
 create mode 100755 t/t5574-fetch-output.sh

Range-diff against v1:
1:  0d0d50d14c = 1:  0d0d50d14c fetch: split out tests for output format
2:  29d2c58914 = 2:  29d2c58914 fetch: add a test to exercise invalid output formats
3:  596e12f03a ! 3:  d1fb6eeae7 fetch: fix missing from-reference when fetching HEAD:foo
    @@ Metadata
      ## Commit message ##
         fetch: fix missing from-reference when fetching HEAD:foo

    -    When displaying reference updates, we print a line that looks similar to
    -    the following:
    +    `store_updated_refs()` parses the remote reference for two purposes:
    +
    +        - It gets used as a note when writing FETCH_HEAD.
    +
    +        - It is passed through to `display_ref_update()` to display
    +          updated references in the following format:
    +
    +          ```
    +           * branch               master          -> master
    +          ```
    +
    +    In most cases, the parsed remote reference is the prettified reference
    +    name and can thus be used for both cases. But if the remote reference is
    +    HEAD, the parsed remote reference becomes empty. This is intended when
    +    we write the FETCH_HEAD, where we skip writing the note in that case.
    +    But it is not intended when displaying the updated references and would
    +    cause us to miss the left-hand side of the displayed reference update:

         ```
    -     * branch               master          -> master
    -    ```
    -
    -    The "branch" bit changes depending on what kind of reference we're
    -    updating, while both of the right-hand references are computed by
    -    stripping well-known prefixes like "refs/heads/" or "refs/tags".
    -
    -    The logic is kind of intertwined though and not easy to follow: we
    -    precompute both the kind (e.g. "branch") and the what, which is the
    -    abbreviated remote reference name, in `store_updated_refs()` and then
    -    pass it down the call chain to `display_ref_update()`.
    -
    -    There is a set of different cases here:
    -
    -        - When the remote reference name is "HEAD" we assume no kind and
    -          will thus instead print "[new ref]". We keep what at the empty
    -          string.
    -
    -        - When the remote reference name has a well-known prefix then the
    -          kind would be "branch", "tag" or "remote-tracking branch". The
    -          what is the reference with the well-known prefix stripped and in
    -          fact matches the output that `prettify_refname()` would return.
    -
    -        - Otherwise, we'll again assume no kind and keep the what set to the
    -          fully qualified reference name.
    -
    -    Now there is a bug with the first case here, where the remote reference
    -    name is "HEAD". As noted, "what" will be set to the empty string. And
    -    that seems to be intentional because we also use this information to
    -    update the FETCH_HEAD, and in case we're updating HEAD we seemingly
    -    don't want to append that to our FETCH_HEAD value.
    -
    -    But as mentioned, we also use this value to display reference updates.
    -    And while the call to `display_ref_update()` correctly figures out that
    -    we meant "HEAD" when `what` is empty, the call to `update_local_ref()`
    -    doesn't. `update_local_ref()` will then call `display_ref_update()` with
    -    the empty string and cause the following broken output:
    -
    -    ```
    -    $ git fetch --dry-run origin HEAD:foo
    +    $ git fetch origin HEAD:foo
         From https://github.com/git/git
          * [new ref]                          -> foo
         ```

         The HEAD string is clearly missing from the left-hand side of the arrow,
    -    which is further stressed by the point that the following commands work
    -    as expected:
    +    which is further stressed by the point that the following commands show
    +    the left-hand side as expected:

         ```
    -    $ git fetch --dry-run origin HEAD
    +    $ git fetch origin HEAD
         From https://github.com/git/git
          * branch                  HEAD       -> FETCH_HEAD

    -    $ git fetch --dry-run origin master
    +    $ git fetch origin master
         From https://github.com/git/git
          * branch                  master     -> FETCH_HEAD
          * branch                  master     -> origin/master
         ```

    -    Fix this bug by instead unconditionally passing the full reference name
    -    to `display_ref_update()` which learns to call `prettify_refname()` on
    -    it. This does fix the above bug and is otherwise functionally the same
    -    as `prettify_refname()` would only ever strip the well-known prefixes
    -    just as intended. So at the same time, this also simplifies the code a
    -    bit.
    +    The logic of how we compute the remote reference name that we ultimately
    +    pass to `display_ref_update()` is not easy to follow. There are three
    +    different cases here:
    +
    +        - When the remote reference name is "HEAD" we set the remote
    +          reference name to the empty string. This is the case that causes
    +          the bug to occur, where we would indeed want to print "HEAD"
    +          instead of the empty string. This is what `prettify_refname()`
    +          would return.
    +
    +        - When the remote reference name has a well-known prefix then we
    +          strip this prefix. This matches what `prettify_refname()` does.
    +
    +        - Otherwise, we keep the fully qualified reference name. This also
    +          matches what `prettify_refname()` does.
    +
    +    As the return value of `prettify_refname()` would do the correct thing
    +    for us in all three cases, we can fix the bug by passing through the
    +    full remote reference name to `display_ref_update()`, which learns to
    +    call `prettify_refname()`. At the same time, this also simplifies the
    +    code a bit.

         Note that this patch also changes formatting of the block that computes
         the "kind" and "what" variables. This is done on purpose so that it is
    @@ t/t5574-fetch-output.sh: test_expect_success 'fetch compact output' '
     +	EOF
     +	test_cmp expect actual &&
     +
    ++	git -C head fetch origin HEAD >actual 2>&1 &&
    ++	test_cmp expect actual &&
    ++
     +	git -C head fetch --dry-run origin HEAD:foo >actual 2>&1 &&
     +	cat >expect <<-EOF &&
     +	From $(test-tool path-utils real_path .)/.
     +	 * [new ref]         HEAD       -> foo
     +	EOF
    ++	test_cmp expect actual &&
    ++
    ++	git -C head fetch origin HEAD:foo >actual 2>&1 &&
     +	test_cmp expect actual
     +'
     +
4:  8571363be1 = 4:  b545bf8bb9 fetch: introduce `display_format` enum
5:  d98c3ee0ce = 5:  4990d35998 fetch: move display format parsing into main function
6:  640a8840e1 ! 6:  cfe84129ab fetch: move option related variables into main function
    @@ Commit message
         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>

      ## builtin/fetch.c ##
    -@@ builtin/fetch.c: static int all, append, dry_run, force, keep, multiple, update_head_ok;
    +@@ builtin/fetch.c: static int fetch_prune_tags_config = -1; /* unspecified */
    + static int prune_tags = -1; /* unspecified */
    + #define PRUNE_TAGS_BY_DEFAULT 0 /* do we prune tags by default? */
    + 
    +-static int all, append, dry_run, force, keep, multiple, update_head_ok;
    ++static int append, dry_run, force, keep, update_head_ok;
      static int write_fetch_head = 1;
      static int verbosity, deepen_relative, set_upstream, refetch;
      static int progress = -1;
    @@ builtin/fetch.c: static int fetch_one(struct remote *remote, int argc, const cha
      	enum display_format display_format = DISPLAY_FORMAT_UNKNOWN;
      	struct string_list list = STRING_LIST_INIT_DUP;
      	struct remote *remote = NULL;
    ++	int all = 0, multiple = 0;
      	int result = 0;
      	int prune_tags_ok = 1;
     +	int enable_auto_gc = 1;
7:  3b2cad066a ! 7:  0335e5eeb4 fetch: introduce new `--output-format` option
    @@ Commit message
         current mechanism feels a little bit indirect and rigid.

         Introduce a new `--output-format` option that allows the user to change
    -    the desired output format more directly.
    +    the desired format more directly.

         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>

    @@ Documentation/fetch-options.txt: linkgit:git-config[1].
      	Write the list of remote refs fetched in the `FETCH_HEAD`

      ## builtin/fetch.c ##
    +@@ builtin/fetch.c: enum display_format {
    + 	DISPLAY_FORMAT_UNKNOWN = 0,
    + 	DISPLAY_FORMAT_FULL,
    + 	DISPLAY_FORMAT_COMPACT,
    ++	DISPLAY_FORMAT_MAX,
    ++};
    ++
    ++static const char * const display_formats[DISPLAY_FORMAT_MAX] = {
    ++	NULL,
    ++	"full",
    ++	"compact",
    + };
    + 
    + struct display_state {
    +@@ builtin/fetch.c: static int fetch_finished(int result, struct strbuf *out,
    + 	return 0;
    + }
    + 
    +-static int fetch_multiple(struct string_list *list, int max_children)
    ++static int fetch_multiple(struct string_list *list, int max_children,
    ++			  enum display_format format)
    + {
    + 	int i, result = 0;
    + 	struct strvec argv = STRVEC_INIT;
    +@@ builtin/fetch.c: static int fetch_multiple(struct string_list *list, int max_children)
    + 		     "--no-write-commit-graph", NULL);
    + 	add_options_to_argv(&argv);
    + 
    ++	if (format != DISPLAY_FORMAT_UNKNOWN)
    ++		strvec_pushf(&argv, "--output-format=%s", display_formats[format]);
    ++
    + 	if (max_children != 1 && list->nr != 1) {
    + 		struct parallel_fetch_state state = { argv.v, list, 0, 0 };
    + 		const struct run_process_parallel_opts opts = {
     @@ builtin/fetch.c: static int fetch_one(struct remote *remote, int argc, const char **argv,
      	return exit_code;
      }

    ++static enum display_format parse_display_format(const char *format)
    ++{
    ++	for (int i = 0; i < ARRAY_SIZE(display_formats); i++)
    ++		if (display_formats[i] && !strcmp(display_formats[i], format))
    ++			return i;
    ++	return DISPLAY_FORMAT_UNKNOWN;
    ++}
    ++
     +static int opt_parse_output_format(const struct option *opt, const char *arg, int unset)
     +{
    -+	enum display_format *format = opt->value;
    ++	enum display_format *format = opt->value, parsed;
    ++
     +	if (unset || !arg)
     +		return 1;
    -+	else if (!strcmp(arg, "full"))
    -+		*format = DISPLAY_FORMAT_FULL;
    -+	else if (!strcmp(arg, "compact"))
    -+		*format = DISPLAY_FORMAT_COMPACT;
    -+	else
    ++
    ++	parsed = parse_display_format(arg);
    ++	if (parsed == DISPLAY_FORMAT_UNKNOWN)
     +		return error(_("unsupported output format '%s'"), arg);
    ++	*format = parsed;
     +
     +	return 0;
     +}
    @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
      		OPT_BOOL(0, "write-fetch-head", &write_fetch_head,
      			 N_("write fetched references to the FETCH_HEAD file")),
      		OPT_BOOL('k', "keep", &keep, N_("keep downloaded pack")),
    +@@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
    + 		const char *format = "full";
    + 
    + 		git_config_get_string_tmp("fetch.output", &format);
    +-		if (!strcasecmp(format, "full"))
    +-			display_format = DISPLAY_FORMAT_FULL;
    +-		else if (!strcasecmp(format, "compact"))
    +-			display_format = DISPLAY_FORMAT_COMPACT;
    +-		else
    ++
    ++		display_format = parse_display_format(format);
    ++		if (display_format == DISPLAY_FORMAT_UNKNOWN)
    + 			die(_("invalid value for '%s': '%s'"),
    + 			    "fetch.output", format);
    + 	}
    +@@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
    + 			max_children = fetch_parallel_config;
    + 
    + 		/* TODO should this also die if we have a previous partial-clone? */
    +-		result = fetch_multiple(&list, max_children);
    ++		result = fetch_multiple(&list, max_children, display_format);
    + 	}
    + 
    + 

      ## t/t5574-fetch-output.sh ##
     @@ t/t5574-fetch-output.sh: test_expect_success 'fetch with invalid output format configuration' '
    @@ t/t5574-fetch-output.sh: test_expect_success 'fetch aligned output' '
      	cat >expect <<-\EOF &&
      	main       -> origin/*
      	extraaa    -> *
    +@@ t/t5574-fetch-output.sh: test_expect_success 'fetch compact output' '
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success 'fetch compact output with multiple remotes' '
    ++	test_when_finished "rm -rf compact-cfg compact-cli" &&
    ++
    ++	git clone . compact-cli &&
    ++	git -C compact-cli remote add second-remote "$PWD" &&
    ++	git clone . compact-cfg &&
    ++	git -C compact-cfg remote add second-remote "$PWD" &&
    ++	test_commit multi-commit &&
    ++
    ++	git -C compact-cfg -c fetch.output=compact fetch --all >actual-cfg 2>&1 &&
    ++	git -C compact-cli fetch --output-format=compact --all >actual-cli 2>&1 &&
    ++	test_cmp actual-cfg actual-cli &&
    ++
    ++	grep -e "->" actual-cfg | cut -c 22- >actual &&
    ++	cat >expect <<-\EOF &&
    ++	main         -> origin/*
    ++	multi-commit -> *
    ++	main       -> second-remote/*
    ++	EOF
    ++	test_cmp expect actual
    ++'
    ++
    + test_expect_success 'fetch output with HEAD and --dry-run' '
    + 	test_when_finished "rm -rf head" &&
    + 	git clone . head &&
8:  301138da03 ! 8:  d7c1bc1a80 fetch: introduce machine-parseable "porcelain" output format
    @@ Commit message
         so that other data printed to stderr, like error messages or progress
         meters, don't interfere with the parseable data.

    +    A notable ommission here is that the output format does not include the
    +    remote from which a reference was fetched, which might be important
    +    information especially in the context of multi-remote fetches. But as
    +    such a format would require us to print the remote for every single
    +    reference update due to parallelizable fetches it feels wasteful for the
    +    most likely usecase, which is when fetching from a single remote. If
    +    usecases come up for this in the future though it is easy enough to add
    +    a new "porcelain-v2" format that adds this information.
    +
         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>

      ## Documentation/config/fetch.txt ##
    @@ builtin/fetch.c: enum display_format {
      	DISPLAY_FORMAT_FULL,
      	DISPLAY_FORMAT_COMPACT,
     +	DISPLAY_FORMAT_PORCELAIN,
    + 	DISPLAY_FORMAT_MAX,
    + };
    + 
    +@@ builtin/fetch.c: static const char * const display_formats[DISPLAY_FORMAT_MAX] = {
    + 	NULL,
    + 	"full",
    + 	"compact",
    ++	"porcelain",
      };

      struct display_state {
    @@ builtin/fetch.c: static int prune_refs(struct display_state *display_state,
      					   summary_width);
      			warn_dangling_symref(stderr, dangling_msg, ref->name);
      		}
    -@@ builtin/fetch.c: static int opt_parse_output_format(const struct option *opt, const char *arg, in
    - 		*format = DISPLAY_FORMAT_FULL;
    - 	else if (!strcmp(arg, "compact"))
    - 		*format = DISPLAY_FORMAT_COMPACT;
    -+	else if (!strcmp(arg, "porcelain"))
    -+		*format = DISPLAY_FORMAT_PORCELAIN;
    - 	else
    - 		return error(_("unsupported output format '%s'"), arg);
    - 
    -@@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
    - 			display_format = DISPLAY_FORMAT_FULL;
    - 		else if (!strcasecmp(format, "compact"))
    - 			display_format = DISPLAY_FORMAT_COMPACT;
    -+		else if (!strcasecmp(format, "porcelain"))
    -+			display_format = DISPLAY_FORMAT_PORCELAIN;
    - 		else
    - 			die(_("invalid value for '%s': '%s'"),
    - 			    "fetch.output", format);

      ## t/t5574-fetch-output.sh ##
    -@@ t/t5574-fetch-output.sh: test_expect_success 'fetch compact output' '
    +@@ t/t5574-fetch-output.sh: test_expect_success 'fetch compact output with multiple remotes' '
      	test_cmp expect actual
      '

-- 
2.40.1

Attachment:
signature.asc

Description: PGP signature