Patrick Steinhardt wrote: > On Tue, Jan 09, 2024 at 05:53:37PM +0000, Victoria Dye via GitGitGadget wrote: >> From: Victoria Dye <vdye@xxxxxxxxxx> >> >> Update the validation of "curl URL" submodule URLs (i.e. those that specify >> an "http[s]" or "ftp[s]" protocol) in 'check_submodule_url()' to catch more >> invalid URLs. The existing validation using 'credential_from_url_gently()' >> parses certain URLs incorrectly, leading to invalid submodule URLs passing >> 'git fsck' checks. Conversely, 'url_normalize()' - used to validate remote >> URLs in 'remote_get()' - correctly identifies the invalid URLs missed by >> 'credential_from_url_gently()'. > > Okay, so we retain the wrong behavior of `credential_from_url_gently()`, > right? I wonder whether this can be abused in any way, doubly so because > the function gets invoked with untrusted input from the remote server > when we handle redirects in `http_request_reauth()`. But the redirect > URL we end up passing to `credential_from_url_gently()` would have to > contain a non-numeric port, and curl seemingly does not know to handle > those either. Correct, nothing about 'credential_from_url_gently()' changes here. As for whether it could be abused - I don't *think* so, but I'm definitely not a security expert. If it helps, here's a more detailed breakdown of the issue: In 'credential_from_url_1()', suppose we have URL "http://example.com:test/repo.git". Stepping through the variables: - 'cp' is "example.com:test/repo.git" - 'at' is NULL - 'colon' is ":test/repo.git" - 'slash' is "/repo.git" Because 'at' is NULL, we set 'host = cp'. Later, because 'slash - host > 0', we call 'url_decode_mem()' on "example.com:test" (which, in this case, doesn't change anything) and the result 'host' to "example.com:test". The issue for the fsck check is that 'credential_from_url_gently()' doesn't validate the hostname it extracts (e.g. whether ':' precedes a valid port, or if the hostname contains a '%'-escaped sequence). I don't *think* that could be abused since, like you said, cURL should just reject the invalid URL altogether. > > Other callsites include fsck (which you're fixing) and the credential > store (which is entirely user-controlled). It would be great regardless > to fix the underlying bug in `credential_from_url_gently()` eventually > though. But I do not think that this has to be part this patch series > here, which is a strict improvement. Agreed! I think normalizing the URL before trying to extract the credentials may be all that's needed to avoid surprise URL errors, but that probably warrants a separate patch submission (with appropriately thorough testing). > > Thanks! > > Patrick