Re: [PATCH v6 10/13] convert: generate large test files only once

Lars Schneider <larsxschneider@xxxxxxxxx> · Tue, 30 Aug 2016 13:41:59 +0200

> On 29 Aug 2016, at 19:46, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> 
> larsxschneider@xxxxxxxxx writes:
> 
>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
>> index 7b45136..34c8eb9 100755
>> --- a/t/t0021-conversion.sh
>> +++ b/t/t0021-conversion.sh
>> @@ -4,6 +4,15 @@ test_description='blob conversion via gitattributes'
>> 
>> . ./test-lib.sh
>> 
>> +if test_have_prereq EXPENSIVE
>> +then
>> +	T0021_LARGE_FILE_SIZE=2048
>> +	T0021_LARGISH_FILE_SIZE=100
>> +else
>> +	T0021_LARGE_FILE_SIZE=30
>> +	T0021_LARGISH_FILE_SIZE=2
>> +fi
> 
> Minor: do we need T0021_ prefix?  What are you trying to avoid
> collisions with?

Not necessary. I'll remove the prefix.

>> +	git checkout -- test test.t test.i &&
>> +
>> +	mkdir generated-test-data &&
>> +	for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
>> +	do
>> +		RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
>> +		ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"
> 
> In earlier iteration of loop with lower $i, what guarantees that
> some bytes survive "tr -dc"?

Nothing really, good catch! The seed "end" produces as first character always a 
"S" which would survive "tr -dc". However, that is clunky. I will always set "1"
as first character in $RANDOM_STRING to mitigate the problem.

> 
>> +		# Generate 1MB of empty data and 100 bytes of random characters
> 
> 100 bytes?  It seems to me that you are giving 1MB and then $i-byte
> or less (which sometimes can be zero) of random string.

Outdated comment. Will fix!

> 
>> +		# printf "$(test-genrandom start $i)"
>> +		printf "%1048576d" 1 >>generated-test-data/large.file &&
>> +		printf "$RANDOM_STRING" >>generated-test-data/large.file &&
>> +		printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
>> +		printf "$ROT_RANDOM_STRING" >>generated-test-data/large.file.rot13 &&
>> +
>> +		if test $i = $T0021_LARGISH_FILE_SIZE
>> +		then
>> +			cat generated-test-data/large.file >generated-test-data/largish.file &&
>> +			cat generated-test-data/large.file.rot13 >generated-test-data/largish.file.rot13
>> +		fi
>> +	done
> 
> This "now we are done with the loop, so copy them to the second
> pair" needs to be in the loop?  Shouldn't it come after 'done'?

No, it does not need to be in the loop. I think I could do this
after the loop instead:

head -c $((1048576*$T0021_LARGISH_FILE_SIZE)) generated-test-data/large.file >generated-test-data/largish.file

> I do not quite get the point of this complexity.  You are using
> exactly the same seed "end" every time, so in the first round you
> have 1M of SP, letter '1', letter 'S' (from the genrandom), then
> in the second round you have 1M of SP, letter '1', letter 'S' and
> letter 'p' (the last two from the genrandom), and go on.  Is it
> significant for the purpose of your test that the cruft inserted
> between the repetition of 1M of SP gets longer by one byte but they
> all share the same prefix (e.g. "1S", "1Sp", "1SpZ", "1SpZT",
> ... are what you insert between a large run of spaces)?

The pktline packets have a constant size. If the cruft between 1M of SP 
has a constant size as well then the generated packets for the test data
would repeat themselves. That's why I increased the length after every 1M
of SP.

However, I realized that this test complexity is not necessary. I'll
simplify it in the next round.

Thanks,
Lars