Re: [GSoC][PATCH v3] t: migrate t0110-urlmatch-normalization to the new framework

Christian Couder <christian.couder@xxxxxxxxx> · Mon, 19 Aug 2024 14:46:11 +0200

On Wed, Aug 14, 2024 at 4:21 PM Ghanshyam Thakkar
<shyamthakkar001@xxxxxxxxx> wrote:
>
> helper/test-urlmatch-normalization along with
> t0110-urlmatch-normalization test the `url_normalize()` function from
> 'urlmatch.h'. Migrate them to the unit testing framework for better
> performance. And also add different test_msg()s for better debugging.
>
> In the migration, last two of the checks from `t_url_general_escape()`
> were slightly changed compared to the shellscript. This involves changing

Nit: s/shellscript/shell script/

>
> '\'' -> '
> '\!' -> !
>
> in the urls of those checks. This is because in C strings, we don't
> need to escape "'" and "!". Other than these two, all the urls were
> pasted verbatim from the shellscript.

Nit: s/shellscript/shell script/

> Another change is the removal of MINGW prerequisite from one of the

Nit: s/of MINGW prerequisite/of a MINGW prerequisite/

> test. It was there because[1] on Windows, the command line is a
> Unicode string, it is not possible to pass arbitrary bytes to a
> program. But in unit tests we don't have this limitation.
>
> And since we can construct strings with arbitrary bytes in C, let's
> also remove the test files which contain URLs with arbitrary bytes in
> the 't/t0110' directory and instead embed those URLs in the unit test
> code itself.
>
> [1]: https://lore.kernel.org/git/53CAC8EF.6020707@xxxxxxxxx/
>
> Mentored-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx>
> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx>
> ---
> This version addresses Junio's review and removes the restriction
> of running the unit tests in the 't/' and 't/unit-tests/bin'
> introduced in v2 by embedding the URLs in the code itself.

Nice change.

[...]

> +static void compare_normalized_urls(const char *url1, const char *url2,
> +                                   size_t equal)

Nit: it's better to use 'unsigned int' or just 'int' for bool flags
like "equal". Or is there a reason to use 'size_t' instead?

> +{
> +       char *url1_norm = url_normalize(url1, NULL);
> +       char *url2_norm = url_normalize(url2, NULL);
> +
> +       if (equal) {
> +               if (!check_str(url1_norm, url2_norm))
> +                       test_msg("input url1: %s\n  input url2: %s", url1,
> +                                url2);
> +       } else if (!check_int(strcmp(url1_norm, url2_norm), !=, 0)) {
> +               test_msg(" url1_norm: %s\n   url2_norm: %s\n"
> +                        "  input url1: %s\n  input url2: %s",
> +                        url1_norm, url2_norm, url1, url2);

Nit: something like "normalized url1" might be a bit better than
"url1_norm" as for the 'url1' variable we use "input url1" instead of
just "url1".

> +       }
> +       free(url1_norm);
> +       free(url2_norm);
> +}
> +
> +static void check_normalized_url_length(const char *url, size_t len)
> +{
> +       struct url_info info;
> +       char *url_norm = url_normalize(url, &info);
> +
> +       if (!check_int(info.url_len, ==, len))
> +               test_msg("     input url: %s\n  normalized url: %s", url,
> +                        url_norm);

Above "normalized url" is used for "url_norm" which is good.

> +       free(url_norm);
> +}
> +
> +/* Note that only file: URLs should be allowed without a host */

Nit: maybe s/file:/"file:"/ would make things a bit clearer.

[...]

> +/*
> + * http://@foo specifies an empty user name but does not specify a password
> + * http://foo  specifies neither a user name nor a password
> + * So they should not be equivalent
> + */

Nit: the above comment would be a bit better with URLs inside double
quotes, with a full stop (period) at the end of each sentence and with
only one space character between "http://foo"; and "specifies".

Except for the above nits, I am happy with this version. Thanks.