While testing 'git fsck' checks on .gitmodules URLs, I noticed that some invalid URLs were passing the checks. Digging into it a bit more, the issue turned out to be that 'credential_from_url_gently()' parses certain URLs (like "http://example.com:something/deeper/path") incorrectly, in a way that appeared to return a valid result. Fortunately, these URLs are rejected in fetches/clones/pushes anyway because 'url_normalize()' (called in 'validate_remote_url()') correctly identifies them as invalid. So, to bring 'git fsck' in line with other (stronger) validation done on remote URLs, this series replaces the 'credential_from_url_gently()' check with one that uses 'url_normalize()'. * Patch 1 moves 'check_submodule_url()' to a public location so that it can be used outside of 'fsck.c'. * Patch 2 removes the obsolete/never-used code in 'test-tool submodule check-name' handling names provided on the command line. * Patch 3 adds a 'check-url' mode to 'test-tool submodule', calling the now-public 'check_submodule_url()' method on a given URL, and adds new tests checking valid and invalid submodule URLs. * Patch 4 replaces the 'credential_from_url_gently()' check with 'url_normalize()' followed by 'url_decode()' and an explicit check for newlines (to preserve the newline handling added in 07259e74ec1 (fsck: detect gitmodules URLs with embedded newlines, 2020-03-11)). Changes since V1 ================ * Added 'TEST_TOOL_CHECK_URL_USAGE' to 'submodule_usage'. * Removed unused/unreachable code related to command line inputs in 'test-tool submodule check-name' and 'test-tool submodule check-url'. * Split the new t7450 test case into two tests (the first contains URLs that are validated successfully, the second demonstrates a URL incorrectly marked valid) to clearly show which pattern is handled improperly. The tests are merged in the final patch once the validation is corrected. Thanks! * Victoria Victoria Dye (4): submodule-config.h: move check_submodule_url test-submodule: remove command line handling for check-name t7450: test submodule urls submodule-config.c: strengthen URL fsck check fsck.c | 133 ---------------------------------- submodule-config.c | 140 ++++++++++++++++++++++++++++++++++++ submodule-config.h | 3 + t/helper/test-submodule.c | 52 +++++++++----- t/t7450-bad-git-dotfiles.sh | 26 +++++++ 5 files changed, 203 insertions(+), 151 deletions(-) base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1635%2Fvdye%2Fvdye%2Fstrengthen-fsck-url-checks-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1635/vdye/vdye/strengthen-fsck-url-checks-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1635 Range-diff vs v1: 1: 588de3022d7 = 1: ce1de0406ef submodule-config.h: move check_submodule_url -: ----------- > 2: 14e8834c38b test-submodule: remove command line handling for check-name 2: cf7848edffc ! 3: b6843a58389 t7450: test submodule urls @@ Metadata ## Commit message ## t7450: test submodule urls - Add a test to 't7450-bad-git-dotfiles.sh' to check the validity of different - submodule URLs. To test this directly (without setting up test repositories - & submodules), add a 'check-url' subcommand to 'test-tool submodule' that - calls 'check_submodule_url' in the same way that 'check-name' calls - 'check_submodule_name'. + Add tests to 't7450-bad-git-dotfiles.sh' to check the validity of different + submodule URLs. To verify this directly (without setting up test + repositories & submodules), add a 'check-url' subcommand to 'test-tool + submodule' that calls 'check_submodule_url' in the same way that + 'check-name' calls 'check_submodule_name'. - Mark the test with 'test_expect_failure' because, as it stands, - 'check_submodule_url' marks certain invalid URLs valid. Specifically, the - invalid URL "http://example.com:test/foo.git" is incorrectly marked valid in - the test. + Add two tests to separately address cases where the URL check correctly + filters out invalid URLs and cases where the check misses invalid URLs. Mark + the latter ("url check misses invalid cases") with 'test_expect_failure' to + indicate that this not the undesired behavior. Signed-off-by: Victoria Dye <vdye@xxxxxxxxxx> @@ t/helper/test-submodule.c: static const char *submodule_check_name_usage[] = { }; +#define TEST_TOOL_CHECK_URL_USAGE \ -+ "test-tool submodule check-url <url>" ++ "test-tool submodule check-url" +static const char *submodule_check_url_usage[] = { + TEST_TOOL_CHECK_URL_USAGE, + NULL @@ t/helper/test-submodule.c: static const char *submodule_check_name_usage[] = { #define TEST_TOOL_IS_ACTIVE_USAGE \ "test-tool submodule is-active <name>" static const char *submodule_is_active_usage[] = { -@@ t/helper/test-submodule.c: static const char *submodule_usage[] = { +@@ t/helper/test-submodule.c: static const char *submodule_resolve_relative_url_usage[] = { + + static const char *submodule_usage[] = { + TEST_TOOL_CHECK_NAME_USAGE, ++ TEST_TOOL_CHECK_URL_USAGE, + TEST_TOOL_IS_ACTIVE_USAGE, + TEST_TOOL_RESOLVE_RELATIVE_URL_USAGE, NULL }; +-/* Filter stdin to print only valid names. */ +-static int check_name(void) +typedef int (*check_fn_t)(const char *); + - /* - * Exit non-zero if any of the submodule names given on the command line is - * invalid. If no names are given, filter stdin to print only valid names - * (which is primarily intended for testing). - */ --static int check_name(int argc, const char **argv) -+static int check_submodule(int argc, const char **argv, check_fn_t check_fn) ++/* ++ * Apply 'check_fn' to each line of stdin, printing values that pass the check ++ * to stdout. ++ */ ++static int check_submodule(check_fn_t check_fn) { - if (argc > 1) { - while (*++argv) { -- if (check_submodule_name(*argv) < 0) -+ if (check_fn(*argv) < 0) - return 1; - } - } else { - struct strbuf buf = STRBUF_INIT; - while (strbuf_getline(&buf, stdin) != EOF) { -- if (!check_submodule_name(buf.buf)) -+ if (!check_fn(buf.buf)) - printf("%s\n", buf.buf); - } - strbuf_release(&buf); + struct strbuf buf = STRBUF_INIT; + while (strbuf_getline(&buf, stdin) != EOF) { +- if (!check_submodule_name(buf.buf)) ++ if (!check_fn(buf.buf)) + printf("%s\n", buf.buf); + } + strbuf_release(&buf); @@ t/helper/test-submodule.c: static int cmd__submodule_check_name(int argc, const char **argv) if (argc) usage_with_options(submodule_check_name_usage, options); -- return check_name(argc, argv); -+ return check_submodule(argc, argv, check_submodule_name); +- return check_name(); ++ return check_submodule(check_submodule_name); +} + +static int cmd__submodule_check_url(int argc, const char **argv) @@ t/helper/test-submodule.c: static int cmd__submodule_check_name(int argc, const + if (argc) + usage_with_options(submodule_check_url_usage, options); + -+ return check_submodule(argc, argv, check_submodule_url); ++ return check_submodule(check_submodule_url); } static int cmd__submodule_is_active(int argc, const char **argv) @@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'check names' ' test_cmp expect actual ' -+test_expect_failure 'check urls' ' ++test_expect_success 'check urls' ' + cat >expect <<-\EOF && + ./bar/baz/foo.git + https://example.com/foo.git @@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'check names' ' + ./%0ahost=example.com/foo.git + https://one.example.com/evil?%0ahost=two.example.com + https:///example.com/foo.git -+ http://example.com:test/foo.git + https::example.com/foo.git + http:::example.com/foo.git + EOF + + test_cmp expect actual +' ++ ++# NEEDSWORK: the URL checked here is not valid (and will not work as a remote if ++# a user attempts to clone it), but the fsck check passes. ++test_expect_failure 'url check misses invalid cases' ' ++ test-tool submodule check-url >actual <<-\EOF && ++ http://example.com:test/foo.git ++ EOF ++ ++ test_must_be_empty actual ++' + test_expect_success 'create innocent subrepo' ' git init innocent && 3: 893071530d3 ! 4: b79b1a71780 submodule-config.c: strengthen URL fsck check @@ submodule-config.c: int check_submodule_url(const char *url) ## t/t7450-bad-git-dotfiles.sh ## -@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'check names' ' +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'check urls' ' + ./%0ahost=example.com/foo.git + https://one.example.com/evil?%0ahost=two.example.com + https:///example.com/foo.git ++ http://example.com:test/foo.git + https::example.com/foo.git + http:::example.com/foo.git + EOF +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'check urls' ' test_cmp expect actual ' --test_expect_failure 'check urls' ' -+test_expect_success 'check urls' ' - cat >expect <<-\EOF && - ./bar/baz/foo.git - https://example.com/foo.git +-# NEEDSWORK: the URL checked here is not valid (and will not work as a remote if +-# a user attempts to clone it), but the fsck check passes. +-test_expect_failure 'url check misses invalid cases' ' +- test-tool submodule check-url >actual <<-\EOF && +- http://example.com:test/foo.git +- EOF +- +- test_must_be_empty actual +-' +- + test_expect_success 'create innocent subrepo' ' + git init innocent && + git -C innocent commit --allow-empty -m foo -- gitgitgadget