Re: [PATCH 10/9 v2] test-mergesort: use repeatable random numbers

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 08 Oct 2021 09:23:21 +0200

On Fri, Oct 08 2021, René Scharfe wrote:

> Use MINSTD to generate pseudo-random numbers consistently instead of
> using rand(3), whose output can vary from system to system, and reset
> its seed before filling in the test values.  This gives repeatable
> results across versions and systems, which simplifies sharing and
> comparing of results between developers.
>
> Signed-off-by: René Scharfe <l.s.r@xxxxxx>
> ---
> Change: Use uint32_t to avoid relying on unsigned int being exactly
> 4 bytes wide.  D'oh!
>
>  t/helper/test-mergesort.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/t/helper/test-mergesort.c b/t/helper/test-mergesort.c
> index 29758cf89b..c6fa816be3 100644
> --- a/t/helper/test-mergesort.c
> +++ b/t/helper/test-mergesort.c
> @@ -2,6 +2,12 @@
>  #include "cache.h"
>  #include "mergesort.h"
>
> +static uint32_t minstd_rand(uint32_t *state)
> +{
> +	*state = (uint64_t)*state * 48271 % 2147483647;
> +	return *state;
> +}
> +
>  struct line {
>  	char *text;
>  	struct line *next;
> @@ -60,8 +66,9 @@ static void dist_sawtooth(int *arr, int n, int m)
>  static void dist_rand(int *arr, int n, int m)
>  {
>  	int i;
> +	uint32_t seed = 1;
>  	for (i = 0; i < n; i++)
> -		arr[i] = rand() % m;
> +		arr[i] = minstd_rand(&seed) % m;
>  }
>
>  static void dist_stagger(int *arr, int n, int m)
> @@ -81,8 +88,9 @@ static void dist_plateau(int *arr, int n, int m)
>  static void dist_shuffle(int *arr, int n, int m)
>  {
>  	int i, j, k;
> +	uint32_t seed = 1;
>  	for (i = j = 0, k = 1; i < n; i++)
> -		arr[i] = (rand() % m) ? (j += 2) : (k += 2);
> +		arr[i] = minstd_rand(&seed) % m ? (j += 2) : (k += 2);
>  }
>
>  #define DIST(name) { #name, dist_##name }

Just to your upthread:

    "Right, so we'd need to ship our own random number generator."

I don't really think this matters in either case here, and if anything a
flaky failure in this test would quickly point us in the right
direction, as opposed to say having the N test_expect_success being run
in rand() order or whatever.

If we'd like results we can compare across platforms we're surely better
of here running this in a loop with different per-platform srand()
values N times for some high value of N, than we are in picking one
"golden" distribution.

But just on srand() and rand() use more generally in the test suite: I
think it's fine to just assume that we can call srand()/rand() and get
"predictable" results, because what we're really after in most cases is
to avoid hard-to-diagnose flakyness. If as a result of random
distribution we'll get a consistent failure on one OS (or the flakyness
is just OpenBSD...).

Also generally: If you'd like "portable" rand() for a test just shell
out to perl. I ran this on various Perl versions (oldest 5.12) on Debian
Linux, OSX, Solaris & OpenBSD, all returned the same number for both:

    ruby -e 'srand(1); puts rand'; perl -E 'srand(1); say $^V; say rand'

Whereas a C program doing the same:

    #include <stdio.h>
    #include <stdlib.h>

    int main(void)
    {
            srand(1);
            printf("rand = %d\n", rand());
            return 0;
    }

Returns different numbers an all, and on OpenBSD the number is different
each time, per their well-known non-standard srand()/rand() behavior.