Re: [PATCH 3/3] submodule-config.c: strengthen URL fsck check

Victoria Dye <vdye@xxxxxxxxxx> · Wed, 17 Jan 2024 13:19:30 -0800

Patrick Steinhardt wrote:
> On Tue, Jan 09, 2024 at 05:53:37PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@xxxxxxxxxx>
>>
>> Update the validation of "curl URL" submodule URLs (i.e. those that specify
>> an "http[s]" or "ftp[s]" protocol) in 'check_submodule_url()' to catch more
>> invalid URLs. The existing validation using 'credential_from_url_gently()'
>> parses certain URLs incorrectly, leading to invalid submodule URLs passing
>> 'git fsck' checks. Conversely, 'url_normalize()' - used to validate remote
>> URLs in 'remote_get()' - correctly identifies the invalid URLs missed by
>> 'credential_from_url_gently()'.
> 
> Okay, so we retain the wrong behavior of `credential_from_url_gently()`,
> right? I wonder whether this can be abused in any way, doubly so because
> the function gets invoked with untrusted input from the remote server
> when we handle redirects in `http_request_reauth()`. But the redirect
> URL we end up passing to `credential_from_url_gently()` would have to
> contain a non-numeric port, and curl seemingly does not know to handle
> those either.

Correct, nothing about 'credential_from_url_gently()' changes here. As for
whether it could be abused - I don't *think* so, but I'm definitely not a
security expert. If it helps, here's a more detailed breakdown of the issue:

In 'credential_from_url_1()', suppose we have URL
"http://example.com:test/repo.git";. Stepping through the variables:

- 'cp' is "example.com:test/repo.git"
- 'at' is NULL
- 'colon' is ":test/repo.git"
- 'slash' is "/repo.git"

Because 'at' is NULL, we set 'host = cp'. Later, because 'slash - host > 0',
we call 'url_decode_mem()' on "example.com:test" (which, in this case,
doesn't change anything) and the result 'host' to "example.com:test".

The issue for the fsck check is that 'credential_from_url_gently()' doesn't
validate the hostname it extracts (e.g. whether ':' precedes a valid port,
or if the hostname contains a '%'-escaped sequence). I don't *think* that
could be abused since, like you said, cURL should just reject the invalid
URL altogether.

> 
> Other callsites include fsck (which you're fixing) and the credential
> store (which is entirely user-controlled). It would be great regardless
> to fix the underlying bug in `credential_from_url_gently()` eventually
> though. But I do not think that this has to be part this patch series
> here, which is a strict improvement.

Agreed! I think normalizing the URL before trying to extract the credentials
may be all that's needed to avoid surprise URL errors, but that probably
warrants a separate patch submission (with appropriately thorough testing).

> 
> Thanks!
> 
> Patrick