Re: [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 03 Jun 2021 11:38:19 +0900

"ZheNing Hu via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> @@ -644,13 +664,6 @@ static int parse_ref_filter_atom(const struct ref_format *format,
>  		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
>  				       (int)(ep-atom), atom);
>  
> -	/* Do we have the atom already used elsewhere? */
> -	for (i = 0; i < used_atom_cnt; i++) {
> -		int len = strlen(used_atom[i].name);
> -		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
> -			return i;
> -	}
> -
>  	/*
>  	 * If the atom name has a colon, strip it and everything after
>  	 * it off - it specifies the format for this entry, and
> @@ -660,6 +673,17 @@ static int parse_ref_filter_atom(const struct ref_format *format,
>  	arg = memchr(sp, ':', ep - sp);
>  	atom_len = (arg ? arg : ep) - sp;
>  
> +	if (format->quote_style && !strncmp(sp, "raw", 3) && !arg)
> +		return strbuf_addf_ret(err, -1, _("--format=%.*s cannot be used with"
> +				"--python, --shell, --tcl, --perl"), (int)(ep-atom), atom);
> +
> +	/* Do we have the atom already used elsewhere? */
> +	for (i = 0; i < used_atom_cnt; i++) {
> +		int len = strlen(used_atom[i].name);
> +		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
> +			return i;
> +	}
> +

These two hunks

 - hoists up the code that sets 'arg' to optional string after
   "<atom>:" and counts how long the "<atom>" is in 'atom_len'; the
   change causes the counting done even when the same placeholder is
   already used elsewhere (in which case we do not have to do such
   counting);

 - inserts the early return to reject use of "raw" atom when
   language specific quoting is used.

It probably makes it easier to understand if the former is split
into a separate commit, but at the same time a series with too many
small steps is harder to manage, so let's keep them in a single
change.

But I do not think we want to add the new change at this location,
at least for two reasons:

 * The posted patch checks '!arg' to avoid rejecting "raw:size",
   which would not scale at all.  What if you wanted to later add
   "raw:upcase", which you must reject?

 * We do have enumerated constants for each atom types, but this
   early check and return does string comparison.

Where it belongs is either after "Is the atom a valid one?" loop
where 'atom_len' is used to locate the placeholder's atom in the
table of valid atoms [*], or inside raw_atom_parser().

    Side note: If you read the original code, you would notice that
    there already is a similar "this is a valid atom that appear in
    the valid_atom[] table, but unallowed in this situation" check
    done with .source != SOURCE_NONE conditional.  One downside is
    that until calling raw_atom_parser(), you do not know if
    RAW_BARE or RAW_LENGTH is requested.

If we do inside raw_atom_parser(), it would probably look like this:

+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+
+	if (atom->u.raw_data.option == RAW_BARE && format->quote_style)
+		return strbuf_addf_ret(err, -1,
+				       _("--format=%%(raw) cannot be used with ...")...);
+
+	return 0;
+}