On Wed, Aug 14, 2024 at 4:21 PM Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx> wrote: > > helper/test-urlmatch-normalization along with > t0110-urlmatch-normalization test the `url_normalize()` function from > 'urlmatch.h'. Migrate them to the unit testing framework for better > performance. And also add different test_msg()s for better debugging. > > In the migration, last two of the checks from `t_url_general_escape()` > were slightly changed compared to the shellscript. This involves changing Nit: s/shellscript/shell script/ > > '\'' -> ' > '\!' -> ! > > in the urls of those checks. This is because in C strings, we don't > need to escape "'" and "!". Other than these two, all the urls were > pasted verbatim from the shellscript. Nit: s/shellscript/shell script/ > Another change is the removal of MINGW prerequisite from one of the Nit: s/of MINGW prerequisite/of a MINGW prerequisite/ > test. It was there because[1] on Windows, the command line is a > Unicode string, it is not possible to pass arbitrary bytes to a > program. But in unit tests we don't have this limitation. > > And since we can construct strings with arbitrary bytes in C, let's > also remove the test files which contain URLs with arbitrary bytes in > the 't/t0110' directory and instead embed those URLs in the unit test > code itself. > > [1]: https://lore.kernel.org/git/53CAC8EF.6020707@xxxxxxxxx/ > > Mentored-by: Christian Couder <chriscool@xxxxxxxxxxxxx> > Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx> > Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx> > --- > This version addresses Junio's review and removes the restriction > of running the unit tests in the 't/' and 't/unit-tests/bin' > introduced in v2 by embedding the URLs in the code itself. Nice change. [...] > +static void compare_normalized_urls(const char *url1, const char *url2, > + size_t equal) Nit: it's better to use 'unsigned int' or just 'int' for bool flags like "equal". Or is there a reason to use 'size_t' instead? > +{ > + char *url1_norm = url_normalize(url1, NULL); > + char *url2_norm = url_normalize(url2, NULL); > + > + if (equal) { > + if (!check_str(url1_norm, url2_norm)) > + test_msg("input url1: %s\n input url2: %s", url1, > + url2); > + } else if (!check_int(strcmp(url1_norm, url2_norm), !=, 0)) { > + test_msg(" url1_norm: %s\n url2_norm: %s\n" > + " input url1: %s\n input url2: %s", > + url1_norm, url2_norm, url1, url2); Nit: something like "normalized url1" might be a bit better than "url1_norm" as for the 'url1' variable we use "input url1" instead of just "url1". > + } > + free(url1_norm); > + free(url2_norm); > +} > + > +static void check_normalized_url_length(const char *url, size_t len) > +{ > + struct url_info info; > + char *url_norm = url_normalize(url, &info); > + > + if (!check_int(info.url_len, ==, len)) > + test_msg(" input url: %s\n normalized url: %s", url, > + url_norm); Above "normalized url" is used for "url_norm" which is good. > + free(url_norm); > +} > + > +/* Note that only file: URLs should be allowed without a host */ Nit: maybe s/file:/"file:"/ would make things a bit clearer. [...] > +/* > + * http://@foo specifies an empty user name but does not specify a password > + * http://foo specifies neither a user name nor a password > + * So they should not be equivalent > + */ Nit: the above comment would be a bit better with URLs inside double quotes, with a full stop (period) at the end of each sentence and with only one space character between "http://foo" and "specifies". Except for the above nits, I am happy with this version. Thanks.