Re: t7900's new expensive test

Derrick Stolee <stolee@xxxxxxxxx> · Tue, 1 Dec 2020 15:55:00 -0500

On 12/1/2020 6:39 AM, Jeff King wrote:
> On Tue, Dec 01, 2020 at 06:23:28AM -0500, Jeff King wrote:
> 
>> I'm not sure if EXPENSIVE is the right ballpark, or if we'd want a
>> VERY_EXPENSIVE. On my machine, the whole test suite for v2.29.0 takes 64
>> seconds to run, and setting GIT_TEST_LONG=1 bumps that to 103s. It got a
>> bit worse since then, as t7900 adds an EXPENSIVE test that takes ~200s
>> (it's not strictly additive, since we can work in parallel on other
>> tests for the first bit, but still, yuck).
> 
> Since Stolee is on the cc and has already seen me complaining about his
> test, I guess I should expand a bit. ;)

Ha. I apologize for causing pain here. My thought was that GIT_TEST_LONG=1
was only used by someone really willing to wait, or someone specifically
trying to investigate a problem that only triggers on very large cases.

In that sense, it's not so much intended as a frequently-run regression
test, but a "run this if you are messing with this area" kind of thing.
Perhaps there is a different pattern to use here?

> There are some small wins possible (e.g., using "commit --quiet" seems
> to shave off ~8s when we don't even think about writing a diff), but
> fundamentally the issue is that it just takes a long time to "git add"
> the 5.2GB worth of random data. I almost wonder if it would be worth it
> to hard-coded the known sha1 and sha256 names of the blobs, and write
> them straight into the appropriate loose object file. I guess that is
> tricky, though, because it actually needs to be a zlib stream, not just
> the output of "test-tool genrandom".
>
> Though speaking of which, another easy win might be setting
> core.compression to "0". We know the random data won't compress anyway,
> so there's no point in spending cycles on zlib.

The intention is mostly to expand the data beyond two gigabytes, so
dropping compression to get there seems like a good idea. If we are
not compressing at all, then perhaps we can reliably cut ourselves
closer to the 2GB limit instead of overshooting as a precaution.

> Doing this:
> 
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index d9e68bb2bf..849c6d1361 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -239,6 +239,8 @@ test_expect_success 'incremental-repack task' '
>  '
>  
>  test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
> +	test_config core.compression 0 &&
> +
>  	for i in $(test_seq 1 5)
>  	do
>  		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
> @@ -257,7 +259,7 @@ test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
>  		return 1
>  	done &&
>  	git add big &&
> -	git commit -m "Add big file (2)" &&
> +	git commit -qm "Add big file (2)" &&
>  
>  	# ensure any possible loose objects are in a pack-file
>  	git maintenance run --task=loose-objects &&
> 
> seems to shave off ~140s from the test. I think we could get a little
> more by cleaning up the enormous objects, too (they end up causing the
> subsequent test to run slower, too, though perhaps it was intentional to
> impact downstream tests).

Cutting out 70% out seems like a great idea. I don't think it was super
intentional to slow down those tests.

Thanks,
-Stolee