Re: [PATCH 1/5] ref-filter: support different email formats

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 27 Jul 2020 15:51:10 -0700

"Hariom Verma via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Hariom Verma <hariom18599@xxxxxxxxx>
>
> Currently, ref-filter only supports printing email with arrow brackets.

This is the first time I heard the term "arrow bracket".  Aren't
they more commonly called angle brackets?

> Let's add support for two more email options.
> - trim : print email without arrow brackets.

Why would this be useful?

> - localpart : prints the part before the @ sign

Meaning I'd get "<gitster" for me?

Building small pieces of new feature in each patch is good, and
adding tests to each step is also good.  Why not do the same for
docs?

> +static struct email_option{

Missing SP.

> +	enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
> +} email_option;
> +
> @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf)
>  	const char *eoemail;
>  	if (!email)
>  		return xstrdup("");
> -	eoemail = strchr(email, '>');

The original code prepares to see NULL from this strchr(); that is
why it checks eoemail for NULL and returns an empty string---the
data is broken (i.e. not an address in angle brackets), which this
code cannot do anything about---in the later part of the code.

> +	switch (email_option.option) {
> +	case EO_RAW:
> +		eoemail = strchr(email, '>') + 1;

And this breaks the carefully laid out error handling by the
original code.  Adding 1 to NULL is quite undefined.

> +		break;
> +	case EO_TRIM:
> +		email++;
> +		eoemail = strchr(email, '>');
> +		break;
> +	case EO_LOCALPART:
> +		email++;
> +		eoemail = strchr(email, '@');

The undocumented design here is that you want to return "hariom" for
"<hariom@xxxxxxxxx>", i.e. out of the "trimmed" e-mail, the part
before the at-sign is returned.

If the data were "<hariom>", you'd still want to return "hariom" no?
But because you do not check for NULL, you end up returning an empty
string.

I think you want to cut at the first '@' or '>', whichever comes
first.

> +		break;
> +	case EO_INVALID:
> +	default:

Invalid and unhandled ones are silently ignored and not treated as
an error?  I would have thought that at least the "default" one
would be a BUG(), as you covered all the possible values for the
enum with case arms.  I wouldn't be surprised if seeing EO_INVALID
is also a BUG(), i.e. the control flow that led to the caller to
call this function with EO_INVALID in email_option.option is likely
to be broken.  It's not like you return "" to protect yourself when
fed a bad data from objects---a bad value in .option can only come
here if the parser you wrote for "--format=<string>" produced a
wrong result.

> +		return xstrdup("");
> +	}
> +
>  	if (!eoemail)
>  		return xstrdup("");
> -	return xmemdupz(email, eoemail + 1 - email);
> +	return xmemdupz(email, eoemail - email);
>  }
>  
>  static char *copy_subject(const char *buf, unsigned long len)
> @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
>  			continue;
>  		if (name[wholen] != 0 &&
>  		    strcmp(name + wholen, "name") &&
> -		    strcmp(name + wholen, "email") &&
> +		    !starts_with(name + wholen, "email") &&
>  		    !starts_with(name + wholen, "date"))
>  			continue;
>  		if (!wholine)
> @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
>  			v->s = copy_line(wholine);
>  		else if (!strcmp(name + wholen, "name"))
>  			v->s = copy_name(wholine);
> -		else if (!strcmp(name + wholen, "email"))
> +		else if (starts_with(name + wholen, "email")) {
> +			email_option.option = EO_INVALID;
> +			if (!strcmp(name + wholen, "email"))
> +				email_option.option = EO_RAW;
> +			if (!strcmp(name + wholen, "email:trim"))
> +				email_option.option = EO_TRIM;
> +			if (!strcmp(name + wholen, "email:localpart"))
> +				email_option.option = EO_LOCALPART;

The ref-filter formatting language already knows many "colon plus
modifier" suffix like "refname:short" and "contents:body", but I do
not think we have ugly repetition like the above code to parse them.
Perhaps the addition for "email:<whatever>" can benefit from
studying and mimicking existing practices a bit more?