Re: [PATCH v6 10/13] convert: generate large test files only once

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 29 Aug 2016 10:46:51 -0700

larsxschneider@xxxxxxxxx writes:

> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index 7b45136..34c8eb9 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -4,6 +4,15 @@ test_description='blob conversion via gitattributes'
>  
>  . ./test-lib.sh
>  
> +if test_have_prereq EXPENSIVE
> +then
> +	T0021_LARGE_FILE_SIZE=2048
> +	T0021_LARGISH_FILE_SIZE=100
> +else
> +	T0021_LARGE_FILE_SIZE=30
> +	T0021_LARGISH_FILE_SIZE=2
> +fi

Minor: do we need T0021_ prefix?  What are you trying to avoid
collisions with?

> +	git checkout -- test test.t test.i &&
> +
> +	mkdir generated-test-data &&
> +	for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
> +	do
> +		RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
> +		ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"

In earlier iteration of loop with lower $i, what guarantees that
some bytes survive "tr -dc"?

> +		# Generate 1MB of empty data and 100 bytes of random characters

100 bytes?  It seems to me that you are giving 1MB and then $i-byte
or less (which sometimes can be zero) of random string.

> +		# printf "$(test-genrandom start $i)"
> +		printf "%1048576d" 1 >>generated-test-data/large.file &&
> +		printf "$RANDOM_STRING" >>generated-test-data/large.file &&
> +		printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
> +		printf "$ROT_RANDOM_STRING" >>generated-test-data/large.file.rot13 &&
> +
> +		if test $i = $T0021_LARGISH_FILE_SIZE
> +		then
> +			cat generated-test-data/large.file >generated-test-data/largish.file &&
> +			cat generated-test-data/large.file.rot13 >generated-test-data/largish.file.rot13
> +		fi
> +	done

This "now we are done with the loop, so copy them to the second
pair" needs to be in the loop?  Shouldn't it come after 'done'?

I do not quite get the point of this complexity.  You are using
exactly the same seed "end" every time, so in the first round you
have 1M of SP, letter '1', letter 'S' (from the genrandom), then
in the second round you have 1M of SP, letter '1', letter 'S' and
letter 'p' (the last two from the genrandom), and go on.  Is it
significant for the purpose of your test that the cruft inserted
between the repetition of 1M of SP gets longer by one byte but they
all share the same prefix (e.g. "1S", "1Sp", "1SpZ", "1SpZT",
... are what you insert between a large run of spaces)?