Re: [PATCH v2 1/1] contrib: add coverage-diff script

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 13 Sep 2018 10:40:59 -0700

"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> We have coverage targets in our Makefile for using gcov to display line
> coverage based on our test suite. The way I like to do it is to run:
>
>     make coverage-test
>     make coverage-report
>
> This leaves the repo in a state where every X.c file that was covered has
> an X.c.gcov file containing the coverage counts for every line, and "#####"
> at every uncovered line.
>
> There have been a few bugs in recent patches what would have been caught
> if the test suite covered those blocks (including a few of mine). I want
> to work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.
>
> It is important to not measure the coverage of the codebase by what old code
> is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
> After creating the coverage statistics at a version (say, 'topic') you can
> then run
>
>     contrib/coverage-diff.sh base topic
>
> to see the lines added between 'base' and 'topic' that are not covered by the
> test suite. The output uses 'git blame -c' format so you can find the commits
> responsible and view the line numbers for quick access to the context.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  contrib/coverage-diff.sh | 63 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
>  create mode 100755 contrib/coverage-diff.sh
>
> diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
> new file mode 100755
> index 0000000000..0f226f038c
> --- /dev/null
> +++ b/contrib/coverage-diff.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +
> +# Usage: 'contrib/coverage-diff.sh <version1> <version2>
> +# Outputs a list of new lines in version2 compared to version1 that are
> +# not covered by the test suite. Assumes you ran
> +# 'make coverage-test coverage-report' from root first, so .gcov files exist.

I presume that it is "from root first while you have a checkout of
version2, so .gcov files for version2 exist in the working tree"?

> +V1=$1
> +V2=$2
> +
> +diff_lines () {
> +	while read line
> +	do
> +		if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*"

As you know you are reading from diff output, you do not have to be
so strict in this if condition.  It's not like you try to carefully
reject a line that began with "@@" but did not match this pattern in
the corresponding else block.

"read line" does funny things to backslashes and whitespaces in the
payload ("read -r line" sometimes helps), and echoing $line without
quoting will destroy its contents here and in the line below (but
you do not care because all you care about is if it begins with a
dash).

> +		then
> +			line_num=$(echo $line \
> +				| awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? \\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }')
> +		else
> +			echo "$line_num:$line"
> +			if ! echo $line | grep -q -e "^-"

If a patch removes a line with a solitary 'n' on it, possibly
followed by a SP and something else, such a line would say "-n
something else", and some implementation of "echo -n something else"
would do what this line does not expect.  A safer way to do this,
as you only care if it is a deletion line, is to do

	case "$line" in -*) ;; *) line_num=$(( $line_num + 1 ));; esac

Also you can make the echoing of "$line_num:$line" above
conditional, which will allow you to lose "grep ':+'" in the
pipeline that consumes output from this thing, e.g.

	case "$line" in +*) echo "$line_num:$line";; esac

iff you must write this in shell (but see below).

> +			then
> +				line_num=$(($line_num + 1))
> +			fi
> +		fi
> +	done

I have a feeling that a single Perl script instead of a shell loop
that runs many grep and awk as subprocesses performs better even on
Windows, and it would be more readable and maintainable.

perl -e '
	my $line_num;
	while (<>) {
		# Hunk header?  Grab the beginning in postimage.
		if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
			$line_num = $1;
			next;
		}

		# Have we seen a hunk?  Ignore "diff --git" etc.
		next unless defined $line_num;

		# Deleted line? Ignore.
		if (/^-/) {
			next;
		}

		# Show only the line number of added lines.
		if (/^\+/) {
			print "$line_num\n";
		}
		# Either common context or added line appear in
		# the postimage.  Count it.
		$line_num++;
	}
'

or something like that, given that you seem to only need line
numbers in new_lines.txt anyway?

> +}
> +
> +files=$(git diff --raw $V1 $V2 \
> +	| grep \.c$ \
> +	| awk 'NF>1{print $NF}')

Do we need these other commands in the pipe?  How is this different
from, say,

	git diff --name-only "$V1" "$V2" -- \*.c

?

> +for file in $files
> +do
> +	git diff $V1 $V2 -- $file \
> +		| diff_lines \
> +		| grep ":+" \

I think you meant to match "^<linenum>:+<added contents>" you are
echoing out in the above helper function, but this will find a
removed line that happens to have colon followed by a plus (which is
not all that uncommon in a patch to a shell script).

> +		| sed 's/:/ /g' \

Turning "<linenum>:+<added contents>" to "<linenum> +<added contents>"
with this, so that the next "awk" can show <linenum> only?  Then you
do not need 'g' at the end.  I see you use 'g' in many sed scripts
for uncovered_lines.txt, and I suspect most of 'g's you do not mean.

> +		| awk '{print $1}' \

Then we get a list of $line_num here?

> +		| sort \

This is sorted textually, which goes against intuition because these
lines are all line numbers, but it is done so that you can run
"comm" on it later and "comm" requires us to feed lines sorted
textually.

> +		>new_lines.txt

If I were writing this part, I'd probably write a small Perl script
that takes output from 'git diff "$V1" "$V2" -- "$file"' and then
reduces it to a list of line numbers for newly introduced lines.
Then pipe it to "sort >new_lines.txt" to match what we see below.

> +	hash_file=$(echo $file | sed "s/\//\#/")
> +	cat "$hash_file.gcov" \
> +		| grep \#\#\#\#\# \
> +		| sed 's/    #####: //g' \
> +		| sed 's/\:/ /g' \
> +		| awk '{print $1}' \
> +		| sort \
> +		>uncovered_lines.txt

Do not cat a single regular file into a pipe.  Write pipe at the end
of the line, not the beginning of next line.  Do not grep into sed.
Do not feed sed into sed unless needed.

Without knowing what the above is trying to do, but just
mechanically translating, I'd get something like this:

	sed -ne '/#####/{
		s/    #####: //g
		s/:.*//
                p
	}' "$hash_file.gcov" |
	sort >uncovered_lines.txt

> +	comm -12 uncovered_lines.txt new_lines.txt \
> +		| sed -e 's/$/\)/' \
> +		| sed -e 's/^/\t/' \
> +		>uncovered_new_lines.txt

;-)  This creates something like

	1)
	2)
	3)

out of a list of numbers.  Cute.

> +	grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
> +		echo $file && \
> +		git blame -c $file \
> +			| grep -f uncovered_new_lines.txt
> +
> +	rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
> +done
> +