Re: [GSoC][PATCH v2] t: port helper/test-oid-array.c to unit-tests/t-oid-array.c

Christian Couder <christian.couder@xxxxxxxxx> · Mon, 19 Aug 2024 18:55:29 +0200

On Sat, Aug 3, 2024 at 3:22 PM Ghanshyam Thakkar
<shyamthakkar001@xxxxxxxxx> wrote:
>
> helper/test-oid-array.c along with t0064-oid-array.sh test the
> oid-array.h library, which provides storage and processing

Nit: I think "oid-array.h" is more an API than a library.

> efficiency over large lists of object identifiers.
>
> Migrate them to the unit testing framework for better runtime
> performance and efficiency. Also 'the_hash_algo' is used internally in

It doesn't seem to me that a variable called 'the_hash_algo' is used
internally in oid_array_lookup() anymore.

> oid_array_lookup(), but we do not initialize a repository directory,
> therefore initialize the_hash_algo manually. And
> init_hash_algo():lib-oid.c can aid in this process, so make it public.
>
> Mentored-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx>
> Helped-by: Phillip Wood <phillip.wood123@xxxxxxxxx>
> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx>
> ---
> Changes in v2:
> - removed the use of internal test__run_*() functions.
> - TEST_LOOKUP() is entirely changed where it now accepts a lower bound
>   and an upper bound for checking return values of oid_array_lookup(),
>   instead of passing the condition as a whole literal. (i.e.
>    v1: TEST_LOOKUP(..., ret < 0, ...)
>    v2: TEST_LOOKUP(..., INT_MIN, -1, ...)
>   )

Nice improvements.

> - TEST_ENUMERATION() remains unchanged.

[...]

> +/*
> + * Returns one of GIT_HASH_{SHA1, SHA256, UNKNOWN} based on the value of
> + * GIT_TEST_DEFAULT_HASH. The fallback value in case of absence of
> + * GIT_TEST_DEFAULT_HASH is GIT_HASH_SHA1.
> + */

In this comment, it might be helpful to say that GIT_TEST_DEFAULT_HASH
is an environment variable (while GIT_HASH_* are #defined values).
Also while at it, it might be helpful to say that the function uses
check(algo != GIT_HASH_UNKNOWN) before returning to verify that
GIT_TEST_DEFAULT_HASH is either unset or properly set.

> +int init_hash_algo(void);

[...]

> +static void t_enumeration(const char **input_args, size_t input_sz,
> +                         const char **result, size_t result_sz)
> +{
> +       struct oid_array input = OID_ARRAY_INIT, expect = OID_ARRAY_INIT,
> +                        actual = OID_ARRAY_INIT;
> +       size_t i;
> +
> +       if (fill_array(&input, input_args, input_sz))
> +               return;
> +       if (fill_array(&expect, result, result_sz))
> +               return;

It would have been nice if the arguments were called 'expect_args' and
'expect_sz' in the same way as for 'input'. Is there a reason why we
couldn't just use 'expect' (or maybe 'expected') everywhere instead of
'result'?

Also after the above 'input.nr' is equal to 'input_sz' and 'expect.nr'
is equal to 'result_sz' otherwise we would have already returned fron
the current function.

> +       oid_array_for_each_unique(&input, add_to_oid_array, &actual);
> +       check_uint(actual.nr, ==, expect.nr);

I think it might be better to return if this check fails. Otherwise it
means that we likely messed something up in the 'input_args' or
'result' arguments we passed to the function, and then...

> +       for (i = 0; i < test_min(actual.nr, expect.nr); i++) {
> +               if (!check(oideq(&actual.oid[i], &expect.oid[i])))

...we might not compare here the input oid with the corresponding
result oid we intended to compare it to. This might result in a lot of
not very relevant output.

Returning if check_uint(actual.nr, ==, expect.nr) fails would avoid
such output and also enable us to just use 'actual.nr' instead of
'test_min(actual.nr, expect.nr)' in the 'for' loop above.

> +                       test_msg("expected: %s\n       got: %s\n     index: %" PRIuMAX,
> +                                oid_to_hex(&expect.oid[i]), oid_to_hex(&actual.oid[i]),
> +                                (uintmax_t)i);
> +       }
> +       check_uint(i, ==, result_sz);

As we saw above that 'expect.nr' is equal to 'result_sz', this check
can fail only if 'actual.nr' is different from 'expect.nr' which we
already checked above. So I think this check is redundant and we might
want to get rid of it.

> +       oid_array_clear(&actual);
> +       oid_array_clear(&input);
> +       oid_array_clear(&expect);
> +}

[...]

> +static void t_lookup(const char **input_hexes, size_t n, const char *query_hex,
> +                    int lower_bound, int upper_bound)
> +{
> +       struct oid_array array = OID_ARRAY_INIT;
> +       struct object_id oid_query;
> +       int ret;
> +
> +       if (get_oid_arbitrary_hex(query_hex, &oid_query))
> +               return;

In fill_array() above, we use check_int() to check the result of
get_oid_arbitrary_hex() like this:

              if (!check_int(get_oid_arbitrary_hex(hexes[i], &oid), ==, 0))

It doesn't look consistent to not use check_int() to check the result
of get_oid_arbitrary_hex() here. Or is there a specific reason to do
it in one place but not in another?

> +       if (fill_array(&array, input_hexes, n))
> +               return;
> +       ret = oid_array_lookup(&array, &oid_query);
> +
> +       if (!check_int(ret, <=, upper_bound) ||
> +           !check_int(ret, >=, lower_bound))
> +               test_msg("oid query for lookup: %s", oid_to_hex(&oid_query));
> +
> +       oid_array_clear(&array);
> +}
> +
> +#define TEST_LOOKUP(input_hexes, query, lower_bound, upper_bound, desc) \
> +       TEST(t_lookup(input_hexes, ARRAY_SIZE(input_hexes), query,      \
> +                     lower_bound, upper_bound),                        \
> +            desc " works")
> +
> +static void setup(void)
> +{
> +       int algo = init_hash_algo();
> +       /* because the_hash_algo is used by oid_array_lookup() internally */

I think this comment should be above the first line in this function
as it also explains why we need to use init_hash_algo().

Also something like "/* The hash algo is used by oid_array_lookup()
internally */" seems better to me as there is no 'the_hash_algo'
global variable used by oid_array_lookup().

> +       if (check_int(algo, !=, GIT_HASH_UNKNOWN))
> +               repo_set_hash_algo(the_repository, algo);
> +}
> +
> +int cmd_main(int argc UNUSED, const char **argv UNUSED)
> +{
> +       const char *arr_input[] = { "88", "44", "aa", "55" };
> +       const char *arr_input_dup[] = { "88", "44", "aa", "55",
> +                                       "88", "44", "aa", "55",
> +                                       "88", "44", "aa", "55" };
> +       const char *res_sorted[] = { "44", "55", "88", "aa" };
> +       const char *nearly_55;
> +
> +       if (!TEST(setup(), "setup"))
> +               test_skip_all("hash algo initialization failed");

Nice that we skip all the other tests with a helpful message if setup() fails.

> +       TEST_ENUMERATION(arr_input, res_sorted, "ordered enumeration");
> +       TEST_ENUMERATION(arr_input_dup, res_sorted,
> +                        "ordered enumeration with duplicate suppression");
> +
> +       /* ret is the return value of oid_array_lookup() */

This comment is not relevant anymore in this version.

> +       TEST_LOOKUP(arr_input, "55", 1, 1, "lookup");
> +       TEST_LOOKUP(arr_input, "33", INT_MIN, -1, "lookup non-existent entry");
> +       TEST_LOOKUP(arr_input_dup, "55", 3, 5, "lookup with duplicates");
> +       TEST_LOOKUP(arr_input_dup, "66", INT_MIN, -1,
> +                   "lookup non-existent entry with duplicates");
> +
> +       nearly_55 = init_hash_algo() == GIT_HASH_SHA1 ?
> +                       "5500000000000000000000000000000000000001" :
> +                       "5500000000000000000000000000000000000000000000000000000000000001";
> +       TEST_LOOKUP(((const char *[]){ "55", nearly_55 }), "55", 0, 0,
> +                   "lookup with almost duplicate values");
> +       TEST_LOOKUP(((const char *[]){ "55", "55" }), "55", 0, 1,
> +                   "lookup with single duplicate value");
> +
> +       return test_done();
> +}
> --
> 2.46.0
>