Re: [PATCH 1/2] chainlint: make error messages self-explanatory

Patrick Steinhardt <ps@xxxxxx> · Thu, 29 Aug 2024 12:03:33 +0200

On Thu, Aug 29, 2024 at 05:16:24AM -0400, Eric Sunshine wrote:
> From: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
> 
> The annotations emitted by chainlint to indicate detected problems are
> overly terse, so much so that developers new to the project -- those who
> should most benefit from the linting -- may find them baffling. For
> instance, although the author of chainlint and seasoned Git developers
> may understand that "?!AMP?!" is an abbreviation of "ampersand" and
> indicates a break in the &&-chain, this may not be obvious to newcomers.
> 
> Similarly, although the annotation "?!LOOP?!" is understood by project
> regulars to indicate a missing `|| return 1` (or `|| exit 1` in a
> subshell), newcomers may find it more than a little perplexing. The
> "?!LOOP?!" case is particularly serious since it is likely that some
> newcomers are unaware that shell loops do not terminate automatically
> upon error, and it is more difficult for a newcomer to figure out how to
> correct the problem by examining surrounding code since `|| return 1`
> appears in test scrips relatively infrequently (compared, for instance,
> with &&-chaining).
> 
> Address these shortcomings by emitting human-consumable messages which
> both explain the problem and give a strong hint about how to correct it.

A worthwhile goal indeed. As you say, especially figuring out how to fix
the loop annotations is not exactly straight forward.

[snip]
> diff --git a/t/chainlint.pl b/t/chainlint.pl
> index 5361f23b1d..d79f183dfd 100755
> --- a/t/chainlint.pl
> +++ b/t/chainlint.pl
> @@ -9,7 +9,7 @@
>  # Input arguments are pathnames of shell scripts containing test definitions,
>  # or globs referencing a collection of scripts. For each problem discovered,
>  # the pathname of the script containing the test is printed along with the test
> -# name and the test body with a `?!FOO?!` annotation at the location of each
> +# name and the test body with a `?!ERR?!` annotation at the location of each
>  # detected problem, where "FOO" is a tag such as "AMP" which indicates a broken
>  # &&-chain. Returns zero if no problems are discovered, otherwise non-zero.
>  
> @@ -181,7 +181,7 @@ sub swallow_heredocs {
>  			$self->{lineno} += () = $body =~ /\n/sg;
>  			next;
>  		}
> -		push(@{$self->{parser}->{problems}}, ['UNCLOSED-HEREDOC', $tag]);
> +		push(@{$self->{parser}->{problems}}, ['HEREDOC', $tag]);
>  		$$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
>  		my $body = substr($$b, $start, pos($$b) - $start);
>  		$self->{lineno} += () = $body =~ /\n/sg;

I was wondering why this is being changed here, as I found the old name
to be easier to understand. Then I saw further down that you essentially
use those as identifiers for the actual problem.

Is there a specific reason why we now have the separate translation
step? Couldn't we instead push the translated message here, directly?

> @@ -296,8 +297,11 @@ sub parse_group {
>  
>  sub parse_subshell {
>  	my $self = shift @_;
> -	return ($self->parse(qr/^\)$/),
> -		$self->expect(')'));
> +	$self->{insubshell}++;
> +	my @tokens = ($self->parse(qr/^\)$/),
> +		      $self->expect(')'));
> +	$self->{insubshell}--;
> +	return @tokens;
>  }
>  
>  sub parse_case_pattern {

Okay. The subshell recursion level tracking here is required such that
we can discern LOOPEXIT vs LOOPRETURN cases. Makes sense.

> @@ -641,7 +654,8 @@ sub check_test {
>  	for (sort {$a->[1]->[2] <=> $b->[1]->[2]} @$problems) {
>  		my ($label, $token) = @$_;
>  		my $pos = $token->[2];
> -		$checked .= substr($body, $start, $pos - $start) . " ?!$label?! ";
> +		my $err = format_problem($label, $token);
> +		$checked .= substr($body, $start, $pos - $start) . " ?!ERR $err?! ";
>  		$start = $pos;
>  	}
>  	$checked .= substr($body, $start);
> diff --git a/t/chainlint/arithmetic-expansion.expect b/t/chainlint/arithmetic-expansion.expect
> index 338ecd5861..2efd65dcbd 100644
> --- a/t/chainlint/arithmetic-expansion.expect
> +++ b/t/chainlint/arithmetic-expansion.expect
> @@ -4,6 +4,6 @@
>  5 	baz
>  6 ) &&
>  7 (
> -8 	bar=$((42 + 1)) ?!AMP?!
> +8 	bar=$((42 + 1)) ?!ERR missing '&&'?!
>  9 	baz
>  10 )

I find the resulting error messages a bit confusing: to me it reads as
if "ERR" is missing the ampersands. Is it actually useful to have the
ERR prefix in the first place? We do not output anything but errors, so
it feels somewhat redundant.

Patrick