Re: [GSoC][PATCH] t: migrate helper/test-oidmap.c to unit-tests/t-oidmap.c

"Ghanshyam Thakkar" <shyamthakkar001@xxxxxxxxx> · Wed, 26 Jun 2024 00:46:46 +0530

Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:
> Hi Ghanshyam
>
> On 20/06/2024 10:45, Jonathan Nieder wrote:
> > Ghanshyam Thakkar wrote:
> > 
> >> helper/test-oidmap.c along with t0016-oidmap.sh test the oidmap.h
> >> library which is built on top of hashmap.h to store arbitrary
> >> datastructure (which must contain oidmap_entry, which is a wrapper
> >> around object_id).
>
> I'm not really sure what the sentence is trying to say. I think it would
> be helpful to start the commit message with an introductory sentence
> explaining that the oidmap is currently tested via `test-tool` and this
> commit converts those tests to unit tests.

Got it. Will improve. I just wanted to explain the basics of oidmap to
help ease the review process.

> These entries can be accessed by querying their
> >> associated object_id.
> >>
> >> Migrate them to the unit testing framework for better performance,
> >> concise code and better debugging. Along with the migration also plug
> >> memory leaks and make the test logic independent for all the tests.
>
> >> The migration removes 'put' tests from t0016, because it is used as
> >> setup to all the other tests, so testing it separately does not yield
> >> any benefit.
>
> Thanks sounds sensible, thanks for explaining it in the commit message.
>
> Overall the patch looks pretty good, I've left a couple of comments
> below.
>
> >> Mentored-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> >> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx>
> >> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx>
> >> ---
>
> >> diff --git a/t/unit-tests/t-oidmap.c b/t/unit-tests/t-oidmap.c
> >> new file mode 100644
> >> index 0000000000..9b98a3ed09
> >> --- /dev/null
> >> +++ b/t/unit-tests/t-oidmap.c
> >> @@ -0,0 +1,165 @@
> >> +#include "test-lib.h"
> >> +#include "lib-oid.h"
> >> +#include "oidmap.h"
> >> +#include "hash.h"
> >> +#include "hex.h"
> >> +
> >> +/*
> >> + * elements we will put in oidmap structs are made of a key: the entry.oid
> >> + * field, which is of type struct object_id, and a value: the name field (could
> >> + * be a refname for example)
> >> + */
> >> +struct test_entry {
> >> +	struct oidmap_entry entry;
> >> +	char name[FLEX_ARRAY];
> >> +};
> >> +
> >> +static const char *key_val[][2] = { { "11", "one" },
> >> +				    { "22", "two" },
> >> +				    { "33", "three" } };
> >> +
> >> +static int put_and_check_null(struct oidmap *map, const char *hex,
> >> +			      const char *entry_name)
> >> +{
> >> +	struct test_entry *entry;
> >> +
> >> +	FLEX_ALLOC_STR(entry, name, entry_name);
> >> +	if (get_oid_arbitrary_hex(hex, &entry->entry.oid))
> >> +		return -1;
>
> When writing unit tests it is important to make sure that they fail,
> rather than just return early if there is an error. There are a number
> of places like this that return early without calling one of the check()
> macros to make the test fail.

They do fail. `get_oid_arbitrary_hex()` from 'unit-tests/lib-oid.h' is
a function specifically built for the use in unit tests. And it
contains in built `check_*` to ensure that the tests fails if something
goes wrong and also prints diagnostic info. Maybe we can add a check here
as well to know the line number at which the call failed, but since we
already print queried hex value and other diagnostic info from
`get_oid_arbitrary_hex()`, I thought it would be enough.

> >> +	if (!check(oidmap_put(map, entry) == NULL))
> >> +		return -1;
> >> +	return 0;
> >> +}
> >> +
> >> +static void setup(void (*f)(struct oidmap *map))
> >> +{
> >> +	struct oidmap map = OIDMAP_INIT;
> >> +	int ret = 0;
> >> +
> >> +	for (size_t i = 0; i < ARRAY_SIZE(key_val); i++)
> >> +		if ((ret = put_and_check_null(&map, key_val[i][0],
> >> +					      key_val[i][1])))
>
> Given there is only one caller I think it would be easier to see what is
> going on if the function body was just inlined into the loop here.

Yeah, will do.

> >> +			break;
> >> +
> >> +	if (!ret)
> >> +		f(&map);
> >> +	oidmap_free(&map, 1);
> >> +}
>
> The tests for replace, get, remove all look like faithful translations
> of the old script and are fine apart from some missing check() calls
> when get_oid_arbitrary_hex() fails.
>
> >> +static int key_val_contains(struct test_entry *entry)
> >> +{
> >> +	/* the test is small enough to be able to bear O(n) */
>
> It is good to think about that but I'm not sure we need a comment about
> it in a small test like this.

Got it. Will remove.

> >> +	for (size_t i = 0; i < ARRAY_SIZE(key_val); i++) {
> >> +		if (!strcmp(key_val[i][1], entry->name)) {
> >> +			struct object_id oid;
> >> +			if (get_oid_arbitrary_hex(key_val[i][0], &oid))
> >> +				return -1;
> >> +			if (oideq(&entry->entry.oid, &oid))
> >> +				return 0;
> >> +		}
> >> +	}
> >> +	return 1;
> >> +}
>
> So if we cannot construct the oid we return -1, if the oid matches we
> return 0 and if the oid does not match we return 1
>
> >> +static void t_iterate(struct oidmap *map)
> >> +{
> >> +	struct oidmap_iter iter;
> >> +	struct test_entry *entry;
> >> +	int ret;
> >> +
> >> +	oidmap_iter_init(map, &iter);
> >> +	while ((entry = oidmap_iter_next(&iter))) {
> >> +		if (!check_int((ret = key_val_contains(entry)), ==, 0)) {
> >> +			if (ret == -1)
> >> +				return;
> >> +			test_msg("obtained entry was not given in the input\n"
> >> +				 "  name: %s\n   oid: %s\n",
> >> +				 entry->name, oid_to_hex(&entry->entry.oid));
>
> This checks that all of the expect objects are present, but does not
> check for duplicate objects. An alternative would be to build an array
> of all the entries, then sort it by oid and compare that to a sorted
> version of `key_val`. That is what the scripted version does. We don't
> have any helpers for comparing arrays so you'd need to do that by
> comparing each element in a loop.
>
> >> +		}
> >> +	}
> >> +	check_int(hashmap_get_size(&map->map), ==, ARRAY_SIZE(key_val));
>
> One could argue that this helps guard against duplicate entries but
> that's only true if we trust hashmap_get_size() so I think keeping this
> to check that hashmap_get_size() gives the correct size and changing the
> loop above would be better.

Yeah, since I was not sure if hashmap's order is predictable, I first
checked if the entry exists and later checked if the size matches. I'll
try to do the array approach you mentioned.

Thank you for the review.