Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 05 Dec 2018 15:01:41 +0100

On Wed, Dec 05 2018, SZEDER Gábor wrote:

> On Tue, Dec 04, 2018 at 07:11:08PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> It's a frequent annoyance of mine in the test suite that I'm
>> e.g. running t*.sh with some parallel "prove" in one screen, and then I
>> run tABCD*.sh manually, and get unlucky because they use the same trash
>> dir, and both tests go boom.
>>
>> You can fix that with --root, which is much of what this patch does. My
>> one-liner for doing --stress has been something like:
>>
>>     perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh --root=trash-{} -v'
>>
>> But it would be great if I didn't have to worry about that and could
>> just run two concurrent:
>>
>>     ./t0000-basic.sh
>>
>> So I think we could just set some env variable where instead of having
>> the predictable trash directory we have a $TRASHDIR.$N as this patch
>> does, except we pick $N by locking some test-runs/tABCD.Nth file with
>> flock() during the run.
>
> Is 'flock' portable?  It doesn't appear so.

Portable enough, and since it's just an alias for "grab lock atomically"
it can devolve into other more basic FS functions on systems that don't
have it.

>> Then a stress mode like this would just be:
>>
>>     GIT_TEST_FLOCKED_TRASH_DIRS=1 perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh'
>
> This doesn't address the issue of TCP ports for various daemons.

Once we have a test ABCD and slot offset N (for a fixed size of N) we
can pick a port from that.

>> And sure, we could ship a --stress option too, but it wouldn't be
>> magical in any way, just another "spawn N in a loop" implementation, but
>> you could also e.g. use GNU parallel to drive it, and without needing to
>
> GNU 'parallel' may or may not be available on the platform, that's why
> I stuck with the barebones shell-loops-in-the-background approach.

Yes, my point is not that you should drop that part of your patch for
using GNU parallel, but just to demonstrate that we can get pretty close
to this for most tests with an external tool, and that it would make
this sort of thing work for concurrent tests without needing to decide
to concurrently test in advance.

>> decide to stress test in advance, since we'd either flock() the trash
>> dir, or just mktemp(1)-it.
>
> While 'mktemp' seems to be more portable than 'flock', it doesn't seem
> to be portable enough; at least it's not in POSIX.>

We are already relying on stuff like mktemp() being reliable for
e.g. the split index.

> And in general I'd prefer deterministic trash directory names.  If I
> re-run a failed test after fixing the bug, then currently the trash
> directory will be overwritten, and that's good.  With 'mktemp's junk
> in the trash direcory's name it won't be overwritten, and if my bugfix
> doesn't work, then I'll start accumulating trash directories and won't
> even know which one is from the last run.

In general you wouldn't need to set GIT_TEST_FLOCKED_TRASH_DIRS=1 and
would get the same monolithic trash names as now, and for something like
the flock() version it could use a job number as a suffix like your
patch does.