Re: [PATCH 1/2] ref-filter: add new "describe" atom

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 06 Jul 2023 09:58:09 -0700

Kousik Sanagavarapu <five231003@xxxxxxxxx> writes:

> +describe[:options]:: human-readable name, like
> +		     link-git:git-describe[1]; empty string for
> +		     undescribable commits. The `describe` string may be
> +		     followed by a colon and zero or more comma-separated
> +		     options. Descriptions can be inconsistent when tags
> +		     are added or removed at the same time.
> ++
> +** tags=<bool-value>: Instead of only considering annotated tags, consider
> +		      lightweight tags as well.
> +** abbrev=<number>: Instead of using the default number of hexadecimal digits
> +		    (which will vary according to the number of objects in the
> +		    repository with a default of 7) of the abbreviated
> +		    object name, use <number> digits, or as many digits as
> +		    needed to form a unique object name.
> +** match=<pattern>: Only consider tags matching the given `glob(7)` pattern,
> +		    excluding the "refs/tags/" prefix.
> +** exclude=<pattern>: Do not consider tags matching the given `glob(7)`
> +		      pattern,excluding the "refs/tags/" prefix.

You are missing a SP after the comma in "pattern,excluding" above.

The above description is slightly different from what "git describe
--help" has.  If they are described differently on purpose (e.g. you
may have made "%(describe:abbrev=0)" not to show only the closest
tag, unlike "git describe --abbrev=0"), the differences should be
spelled out more explicitly.  If the behaviours of the option here
and the corresponding one there are meant to be the same, then
either using exactly the same text, or a much abbreviated
description with a note referring to the option description of "git
describe", would help the readers better.  E.g.

    abbrev=<number>;; use at least <number> hexadecimal digits; see
    the corresponding option in linkgit:git-describe[1] for details.

which would make it clear that no behavioral differences are meant.

This new section becomes a part of an existing "labeled list"
(asciidoctor calls the construct "description list").  Starting the
new heading this patch adds with 'describe[:options]::' makes sense.
It is in line with the existing text.

I however think that the list of options is better done as a nested
description list.  Documentation/config/color.txt has an example you
can imitate.  See how slots of color.grep.<slot> are described
there.

  https://docs.asciidoctor.org/asciidoc/latest/lists/description/
  https://asciidoc-py.github.io/userguide.html#_labeled_lists

> diff --git a/ref-filter.c b/ref-filter.c
> index e0d03a9f8e..6ec647c81f 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -5,6 +5,7 @@
>  #include "gpg-interface.h"
>  #include "hex.h"
>  #include "parse-options.h"
> +#include "run-command.h"
>  #include "refs.h"
>  #include "wildmatch.h"
>  #include "object-name.h"
> @@ -146,6 +147,7 @@ enum atom_type {
>  	ATOM_TAGGERDATE,
>  	ATOM_CREATOR,
>  	ATOM_CREATORDATE,
> +	ATOM_DESCRIBE,
>  	ATOM_SUBJECT,
>  	ATOM_BODY,
>  	ATOM_TRAILERS,
> @@ -215,6 +217,13 @@ static struct used_atom {
>  		struct email_option {
>  			enum { EO_RAW, EO_TRIM, EO_LOCALPART } option;
>  		} email_option;
> +		struct {
> +			enum { D_BARE, D_TAGS, D_ABBREV, D_EXCLUDE,
> +			       D_MATCH } option;
> +			unsigned int tagbool;
> +			unsigned int length;

The name "tagbool" sounds strange, as we are not saying
"lengthint".

> +			char *pattern;
> +		} describe;

I am a bit confused by this structure, actually, as I cannot quite
guess from the data structure alone how you intend to use it.  Does
this give a good representation for the piece of data you are trying
to capture?

For example, %(describe:tags=no,abbrev=4) would become a single atom
with 0 in .tagbool and 4 in .length, but what does the .option
member get?

> @@ -462,6 +471,66 @@ static int contents_atom_parser(struct ref_format *format, struct used_atom *ato
>  	return 0;
>  }
>  
> +static int parse_describe_option(const char *arg)
> +{
> +	if (!arg)
> +		return D_BARE;
> +	else if (starts_with(arg, "tags"))
> +		return D_TAGS;
> +	else if (starts_with(arg, "abbrev"))
> +		return D_ABBREV;
> +	else if(starts_with(arg, "exclude"))
> +		return D_EXCLUDE;
> +	else if (starts_with(arg, "match"))
> +		return D_MATCH;
> +	return -1;
> +}
> +
> +static int describe_atom_parser(struct ref_format *format UNUSED,
> +				struct used_atom *atom,
> +				const char *arg, struct strbuf *err)
> +{
> +	int opt = parse_describe_option(arg);
> +
> +	switch (opt) {
> +	case D_BARE:
> +		break;
> +	case D_TAGS:
> +		/*
> +		 * It is also possible to just use describe:tags, which
> +		 * is just treated as describe:tags=1
> +		 */
> +		if (skip_prefix(arg, "tags=", &arg)) {
> +			if (strtoul_ui(arg, 10, &atom->u.describe.tagbool))

This is not how you accept a Boolean.

"1", "0", "yes", "no", "true", "false", "on", "off" are all valid
values and you use git_parse_maybe_bool() to parse them.

> +				return strbuf_addf_ret(err, -1, _("boolean value "
> +						"expected describe:tags=%s"), arg);
> +
> +		} else {
> +			atom->u.describe.tagbool = 1;
> +		}
> +		break;
> +	case D_ABBREV:
> +		skip_prefix(arg, "abbrev=", &arg);
> +		if (strtoul_ui(arg, 10, &atom->u.describe.length))
> +			return strbuf_addf_ret(err, -1, _("positive value "
> +					       "expected describe:abbrev=%s"), arg);
> +		break;
> +	case D_EXCLUDE:
> +		skip_prefix(arg, "exclude=", &arg);
> +		atom->u.describe.pattern = xstrdup(arg);
> +		break;
> +	case D_MATCH:
> +		skip_prefix(arg, "match=", &arg);
> +		atom->u.describe.pattern = xstrdup(arg);
> +		break;
> +	default:
> +		return err_bad_arg(err, "describe", arg);
> +		break;
> +	}
> +	atom->u.describe.option = opt;
> +	return 0;
> +}

Even though the documentation patch we saw earlier said "may be
followed by a colon and zero or more comma-separated options", this
seems to expect only and exactly one option.  Indeed, if we run the
resulting git like "git for-each-ref --format='%(describe:tags=0,abbrev=4)'"
you will get complaints from this parser.

The implementation needs redesigning as the data structure is not
equipped to handle more than one options given at the same time, as
we saw earlier.

> @@ -664,6 +733,7 @@ static struct {
>  	[ATOM_TAGGERDATE] = { "taggerdate", SOURCE_OBJ, FIELD_TIME },
>  	[ATOM_CREATOR] = { "creator", SOURCE_OBJ },
>  	[ATOM_CREATORDATE] = { "creatordate", SOURCE_OBJ, FIELD_TIME },
> +	[ATOM_DESCRIBE] = { "describe", SOURCE_OBJ, FIELD_STR, describe_atom_parser },
>  	[ATOM_SUBJECT] = { "subject", SOURCE_OBJ, FIELD_STR, subject_atom_parser },
>  	[ATOM_BODY] = { "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
>  	[ATOM_TRAILERS] = { "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
> @@ -1483,6 +1553,78 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
>  	}
>  }
>  
> +static void grab_describe_values(struct atom_value *val, int deref,
> +				 struct object *obj)
> +{
> +	struct commit *commit = (struct commit *)obj;
> +	int i;
> +
> +	for (i = 0; i < used_atom_cnt; i++) {
> +		struct used_atom *atom = &used_atom[i];
> +		const char *name = atom->name;
> +		struct atom_value *v = &val[i];
> +		int opt;
> +
> +		struct child_process cmd = CHILD_PROCESS_INIT;
> +		struct strbuf out = STRBUF_INIT;
> +		struct strbuf err = STRBUF_INIT;
> +
> +		if (!!deref != (*name == '*'))
> +			continue;
> +		if (deref)
> +			name++;
> +
> +		if (!skip_prefix(name, "describe", &name) ||
> +		    (*name && *name != ':'))
> +			    continue;

This looks overly expensive.  Why aren't we looking at the atom_type
and see if it is ATOM_DESCRIBE here?

> +		switch(opt) {
> +		case D_BARE:
> +			break;
> +		case D_TAGS:
> +			if (atom->u.describe.tagbool)
> +				strvec_push(&cmd.args, "--tags");
> +			else
> +				strvec_push(&cmd.args, "--no-tags");
> +			break;
> +		case D_ABBREV:
> +			strvec_pushf(&cmd.args, "--abbrev=%d",
> +				     atom->u.describe.length);
> +			break;
> +		case D_EXCLUDE:
> +			strvec_pushf(&cmd.args, "--exclude=%s",
> +				     atom->u.describe.pattern);
> +			break;
> +		case D_MATCH:
> +			strvec_pushf(&cmd.args, "--match=%s",
> +				     atom->u.describe.pattern);
> +			break;
> +		}

Again, it is apparent here that the atom takes only one option at most.

I'll stop here.

Thanks.