Re: [GSoC][PATCH] unit-tests: add tests for oidset.h

Patrick Steinhardt <ps@xxxxxx> · Mon, 26 Aug 2024 09:02:49 +0200

On Sat, Aug 24, 2024 at 10:50:23PM +0530, Ghanshyam Thakkar wrote:
> Add tests for oidset.h library, which were not previously present using
> the unit testing framework.
> 
> This imposes a new restriction of running the test from the 't/' and
> 't/unit-tests/bin' for constructing the path to the test files which
> are used by t_parse_file(), which tests the parsing of object_ids from
> a file. This restriction is similar to the one we already have for
> end-to-end tests, wherein, we can only run those tests from 't/'. The
> addition of allowing 't/unit-tests/bin' for allowing to run tests from
> is for running individual unit tests, which is not currently possible
> via any 'make' target. And 'make unit-tests-test-tool' also runs from
> 't/unit-tests/bin'
> 
> Mentored-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx>
> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx>
> ---
> I know there is some hesitance from the community in imposing the
> restriction of running the unit tests from certain directories, so
> if this case does not justify imposing such a restriction, I am fine
> with removing t_parse_file() in the next version.

Another option would be to set up a preprocessor define that gives us
the path to the unit test root directory. But I'm also okayish with the
current version. If it turns out to be annoying we can still iterate.

> diff --git a/t/unit-tests/lib-oid.c b/t/unit-tests/lib-oid.c
> index 37105f0a8f..8f0ccac532 100644
> --- a/t/unit-tests/lib-oid.c
> +++ b/t/unit-tests/lib-oid.c
> @@ -3,7 +3,7 @@
>  #include "strbuf.h"
>  #include "hex.h"
>  
> -static int init_hash_algo(void)
> +int init_hash_algo(void)
>  {
>  	static int algo = -1;
>  
> diff --git a/t/unit-tests/lib-oid.h b/t/unit-tests/lib-oid.h
> index 8d2acca768..fc3e7aa376 100644
> --- a/t/unit-tests/lib-oid.h
> +++ b/t/unit-tests/lib-oid.h
> @@ -14,4 +14,5 @@
>   */
>  int get_oid_arbitrary_hex(const char *s, struct object_id *oid);
>  
> +int init_hash_algo(void);
>  #endif /* LIB_OID_H */

Let's add a comment to explain what this does. Also, do we maybe want to
give it a name that ties it to the unit tests? `init_hash_algo()` is
quite generic and may easily lead to conflicting symbol names.

Maybe something like `t_oid_init_hash_algo()`. `t_` to indicate that it
is testing-related, `oid_` indicates that it's part of "lib-oid.h", and
the remainder describes what it does.

> diff --git a/t/unit-tests/t-oidset.c b/t/unit-tests/t-oidset.c
> new file mode 100644
> index 0000000000..4a63f9ea94
> --- /dev/null
> +++ b/t/unit-tests/t-oidset.c
> @@ -0,0 +1,222 @@
> +#include "test-lib.h"
> +#include "oidset.h"
> +#include "lib-oid.h"
> +#include "hex.h"
> +#include "strbuf.h"
> +
> +static const char *const hex_input[] = { "00", "11", "22", "33", "aa", "cc" };

I think we typically write this `const char * const`, with another space
between `*` and `const`.

> +static void strbuf_test_data_path(struct strbuf *buf, int hash_algo)
> +{
> +	strbuf_getcwd(buf);
> +	strbuf_strip_suffix(buf, "/unit-tests/bin");
> +	strbuf_addf(buf, "/unit-tests/t-oidset/%s",
> +		    hash_algo == GIT_HASH_SHA1 ? "sha1-oids" : "sha256-oids");
> +}

I wouldn't prefix this with `strbuf_`, as it is not part of the strbuf
subsystem. The function just happens to use a strbuf.

> +static void setup(void (*f)(struct oidset *st))

I was wondering what `st` stands for. I'd either call it just `s` or
`set`.

> +{
> +	struct oidset st = OIDSET_INIT;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	if (!check_int(oidset_size(&st), ==, 0)) {
> +		test_skip_all("OIDSET_INIT is broken");
> +		return;
> +	}
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(hex_input); i++) {
> +		if ((ret = get_oid_arbitrary_hex(hex_input[i], &oid)))
> +			break;
> +		if (!check_int((ret = oidset_insert(&st, &oid)), ==, 0))
> +			break;
> +	}

In both of these cases I'd split out the assignment into a separate
line. While the first instance is likely fine, the second instance makes
me a bit uneasy as it is a macro. I generally do not trust macros to do
the correct thing when being passed a statement with side effects.

[snip]
> +static int input_contains(struct object_id *oid, char *seen)
> +{
> +	for (size_t i = 0; i < ARRAY_SIZE(hex_input); i++) {
> +		struct object_id oid_input;
> +		if (get_oid_arbitrary_hex(hex_input[i], &oid_input))
> +			return -1;
> +		if (oideq(&oid_input, oid)) {
> +			if (seen[i])
> +				return 2;
> +			seen[i] = 1;
> +			return 0;
> +		}
> +	}
> +	return 1;
> +}

This function is somewhat confusing. Contains what? What are the
parameters? Is one of them the expectation, the other one the actual
state? What do we compare against?

I think it would help to get it a more descriptive name that says what
we check for and remove the dependence on global state. Also, the `seen`
array does not seem to be used by any caller. So maybe we should
allocate it ourselves in this function such that it is self-contained.
It requires more allocations, sure, but I highly doubt that this is
going to be important in this test.

Patrick