On Sat, Aug 3, 2024 at 3:22 PM Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx> wrote: > > helper/test-oid-array.c along with t0064-oid-array.sh test the > oid-array.h library, which provides storage and processing Nit: I think "oid-array.h" is more an API than a library. > efficiency over large lists of object identifiers. > > Migrate them to the unit testing framework for better runtime > performance and efficiency. Also 'the_hash_algo' is used internally in It doesn't seem to me that a variable called 'the_hash_algo' is used internally in oid_array_lookup() anymore. > oid_array_lookup(), but we do not initialize a repository directory, > therefore initialize the_hash_algo manually. And > init_hash_algo():lib-oid.c can aid in this process, so make it public. > > Mentored-by: Christian Couder <chriscool@xxxxxxxxxxxxx> > Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx> > Helped-by: Phillip Wood <phillip.wood123@xxxxxxxxx> > Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx> > --- > Changes in v2: > - removed the use of internal test__run_*() functions. > - TEST_LOOKUP() is entirely changed where it now accepts a lower bound > and an upper bound for checking return values of oid_array_lookup(), > instead of passing the condition as a whole literal. (i.e. > v1: TEST_LOOKUP(..., ret < 0, ...) > v2: TEST_LOOKUP(..., INT_MIN, -1, ...) > ) Nice improvements. > - TEST_ENUMERATION() remains unchanged. [...] > +/* > + * Returns one of GIT_HASH_{SHA1, SHA256, UNKNOWN} based on the value of > + * GIT_TEST_DEFAULT_HASH. The fallback value in case of absence of > + * GIT_TEST_DEFAULT_HASH is GIT_HASH_SHA1. > + */ In this comment, it might be helpful to say that GIT_TEST_DEFAULT_HASH is an environment variable (while GIT_HASH_* are #defined values). Also while at it, it might be helpful to say that the function uses check(algo != GIT_HASH_UNKNOWN) before returning to verify that GIT_TEST_DEFAULT_HASH is either unset or properly set. > +int init_hash_algo(void); [...] > +static void t_enumeration(const char **input_args, size_t input_sz, > + const char **result, size_t result_sz) > +{ > + struct oid_array input = OID_ARRAY_INIT, expect = OID_ARRAY_INIT, > + actual = OID_ARRAY_INIT; > + size_t i; > + > + if (fill_array(&input, input_args, input_sz)) > + return; > + if (fill_array(&expect, result, result_sz)) > + return; It would have been nice if the arguments were called 'expect_args' and 'expect_sz' in the same way as for 'input'. Is there a reason why we couldn't just use 'expect' (or maybe 'expected') everywhere instead of 'result'? Also after the above 'input.nr' is equal to 'input_sz' and 'expect.nr' is equal to 'result_sz' otherwise we would have already returned fron the current function. > + oid_array_for_each_unique(&input, add_to_oid_array, &actual); > + check_uint(actual.nr, ==, expect.nr); I think it might be better to return if this check fails. Otherwise it means that we likely messed something up in the 'input_args' or 'result' arguments we passed to the function, and then... > + for (i = 0; i < test_min(actual.nr, expect.nr); i++) { > + if (!check(oideq(&actual.oid[i], &expect.oid[i]))) ...we might not compare here the input oid with the corresponding result oid we intended to compare it to. This might result in a lot of not very relevant output. Returning if check_uint(actual.nr, ==, expect.nr) fails would avoid such output and also enable us to just use 'actual.nr' instead of 'test_min(actual.nr, expect.nr)' in the 'for' loop above. > + test_msg("expected: %s\n got: %s\n index: %" PRIuMAX, > + oid_to_hex(&expect.oid[i]), oid_to_hex(&actual.oid[i]), > + (uintmax_t)i); > + } > + check_uint(i, ==, result_sz); As we saw above that 'expect.nr' is equal to 'result_sz', this check can fail only if 'actual.nr' is different from 'expect.nr' which we already checked above. So I think this check is redundant and we might want to get rid of it. > + oid_array_clear(&actual); > + oid_array_clear(&input); > + oid_array_clear(&expect); > +} [...] > +static void t_lookup(const char **input_hexes, size_t n, const char *query_hex, > + int lower_bound, int upper_bound) > +{ > + struct oid_array array = OID_ARRAY_INIT; > + struct object_id oid_query; > + int ret; > + > + if (get_oid_arbitrary_hex(query_hex, &oid_query)) > + return; In fill_array() above, we use check_int() to check the result of get_oid_arbitrary_hex() like this: if (!check_int(get_oid_arbitrary_hex(hexes[i], &oid), ==, 0)) It doesn't look consistent to not use check_int() to check the result of get_oid_arbitrary_hex() here. Or is there a specific reason to do it in one place but not in another? > + if (fill_array(&array, input_hexes, n)) > + return; > + ret = oid_array_lookup(&array, &oid_query); > + > + if (!check_int(ret, <=, upper_bound) || > + !check_int(ret, >=, lower_bound)) > + test_msg("oid query for lookup: %s", oid_to_hex(&oid_query)); > + > + oid_array_clear(&array); > +} > + > +#define TEST_LOOKUP(input_hexes, query, lower_bound, upper_bound, desc) \ > + TEST(t_lookup(input_hexes, ARRAY_SIZE(input_hexes), query, \ > + lower_bound, upper_bound), \ > + desc " works") > + > +static void setup(void) > +{ > + int algo = init_hash_algo(); > + /* because the_hash_algo is used by oid_array_lookup() internally */ I think this comment should be above the first line in this function as it also explains why we need to use init_hash_algo(). Also something like "/* The hash algo is used by oid_array_lookup() internally */" seems better to me as there is no 'the_hash_algo' global variable used by oid_array_lookup(). > + if (check_int(algo, !=, GIT_HASH_UNKNOWN)) > + repo_set_hash_algo(the_repository, algo); > +} > + > +int cmd_main(int argc UNUSED, const char **argv UNUSED) > +{ > + const char *arr_input[] = { "88", "44", "aa", "55" }; > + const char *arr_input_dup[] = { "88", "44", "aa", "55", > + "88", "44", "aa", "55", > + "88", "44", "aa", "55" }; > + const char *res_sorted[] = { "44", "55", "88", "aa" }; > + const char *nearly_55; > + > + if (!TEST(setup(), "setup")) > + test_skip_all("hash algo initialization failed"); Nice that we skip all the other tests with a helpful message if setup() fails. > + TEST_ENUMERATION(arr_input, res_sorted, "ordered enumeration"); > + TEST_ENUMERATION(arr_input_dup, res_sorted, > + "ordered enumeration with duplicate suppression"); > + > + /* ret is the return value of oid_array_lookup() */ This comment is not relevant anymore in this version. > + TEST_LOOKUP(arr_input, "55", 1, 1, "lookup"); > + TEST_LOOKUP(arr_input, "33", INT_MIN, -1, "lookup non-existent entry"); > + TEST_LOOKUP(arr_input_dup, "55", 3, 5, "lookup with duplicates"); > + TEST_LOOKUP(arr_input_dup, "66", INT_MIN, -1, > + "lookup non-existent entry with duplicates"); > + > + nearly_55 = init_hash_algo() == GIT_HASH_SHA1 ? > + "5500000000000000000000000000000000000001" : > + "5500000000000000000000000000000000000000000000000000000000000001"; > + TEST_LOOKUP(((const char *[]){ "55", nearly_55 }), "55", 0, 0, > + "lookup with almost duplicate values"); > + TEST_LOOKUP(((const char *[]){ "55", "55" }), "55", 0, 1, > + "lookup with single duplicate value"); > + > + return test_done(); > +} > -- > 2.46.0 >