Re: [PATCH v2] urlmatch: create fetch.credentialsInUrl config

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 27 May 2022 11:09:25 -0700

"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
> index cd65d236b43..7fd3ea89f5d 100644
> --- a/Documentation/config/fetch.txt
> +++ b/Documentation/config/fetch.txt
> @@ -96,3 +96,17 @@ fetch.writeCommitGraph::
>  	merge and the write may take longer. Having an updated commit-graph
>  	file helps performance of many Git commands, including `git merge-base`,
>  	`git push -f`, and `git log --graph`. Defaults to false.
> +
> +fetch.credentialsInUrl::
> +	A URL can contain plaintext credentials in the form
> +	`protocol://<user>:<password>@domain/path`. Using such URLs is not

In the above, 'protocol', 'domain' and 'path' are all placeholders,
just like <user> and <password> are, and it made me wonder if we
should be consistent.  Enclosing all placeholders in <angle-braket>
pairs would make the resulting text too loud, so it may make sense
to drop these highlights around user and password.

That makes it consistent with what git-credential-store.txt,
git-svn.txt, gitfaq.txt and urls.txt do

    git-credential-store.txt:https://user:pass@xxxxxxxxxxx
    git-svn.txt:	the URL, e.g. `svn+ssh://foo@xxxxxxxxxxx/project`
    gitfaq.txt:$ echo url=https://author@xxxxxxxxxxxxxxx | git credential reject
    gitfaq.txt:	https://author@xxxxxxxxxxxxxxx/org1/project1.git and
    gitfaq.txt:	https://committer@xxxxxxxxxxxxxxx/org2/project2.git.  This way, when you
    gitfaq.txt:	origin https://author@xxxxxxxxxxxxxxx/org1/project1.git` (see

> +	recommended as it exposes the password in multiple ways, including
> +	Git storing the URL as plaintext in the repository config. The
> +	`fetch.credentialsInUrl` option provides instruction for how Git
> +	should react to seeing such a URL, with these values:
> ++
> +* `allow` (default): Git will proceed with its activity without warning.
> +* `warn`: Git will write a warning message to `stderr` when parsing a URL
> +  with a plaintext credential.
> +* `die`: Git will write a failure message to `stderr` when parsing a URL
> +  with a plaintext credential.

Good.

> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 4a61f2c901e..387da74d175 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -71,6 +71,18 @@ test_expect_success 'clone respects GIT_WORK_TREE' '
>  
>  '
>  
> +test_expect_success 'clone warns or fails when using username:password' '
> +	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err &&
> +	grep "warning: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err &&
> +	test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err &&
> +	grep "fatal: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err
> +'
> +
> +test_expect_success 'clone does not detect username:password when it is https://username@domain:port/' '
> +	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username@localhost:8080 attempt3 2>err &&
> +	! grep "uses plaintext credentials" err
> +'

Could we have one negative test here, too?  I.e. with an explicit -c
fetch.credentialsInUrl=allow given, no credential-in-URL errors and
warnings should be issued.

> diff --git a/urlmatch.c b/urlmatch.c
> index b615adc923a..16beda37a3a 100644
> --- a/urlmatch.c
> +++ b/urlmatch.c
> @@ -1,5 +1,6 @@
>  #include "cache.h"
>  #include "urlmatch.h"
> +#include "config.h"
>  
>  #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
>  #define URL_DIGIT "0123456789"
> @@ -106,6 +107,59 @@ static int match_host(const struct url_info *url_info,
>  	return (!url_len && !pat_len);
>  }
>  
> +/*
> + * Call this method when we have detected credentials within the 'url' in
> + * the form
> + *
> + *     scheme://username:password@domain[:port][/path]
> + *
> + * The 'scheme_len' value should be equal to the string length of the
> + * "scheme://" portion of the URL.

"scheme" is probably more technically correct here, even though in
the documentation that faces the end-users, "protocol" would be more
widely understood, so I think this inconsistency in a single patch
across paths is a good thing ;-)

> + * The fetch.credentialsInUrl config indicates what to do on such a URL,
> + * either ignoring, warning, or die()ing. The latter two modes write a
> + * redacted URL to stderr.
> + */
> +static void detected_credentials_in_url(const char *url, size_t scheme_len)
> +{
> +	const char *value;
> +	const char *at_ptr;
> +	const char *colon_ptr;
> +	struct strbuf redacted = STRBUF_INIT;
> +
> +	/* "allow" is the default behavior. */
> +	if (git_config_get_string_tmp("fetch.credentialsinurl", &value) ||
> +	    !strcmp("allow", value))
> +		return;

OK.

> +	at_ptr = strchr(url, '@');
> +	colon_ptr = strchr(url + scheme_len + 3, ':');

I am debating myself if we should explain "+ 3" that counts "://" in
a comment, but it probably is trivial to see.

> +	/*
> +	 * Let's do some defensive programming to ensure the given
> +	 * URL is of the proper format.
> +	 */
> +	if (!colon_ptr)
> +		BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX,
> +		    url, (uintmax_t) scheme_len);
> +	if (colon_ptr > at_ptr)
> +		BUG("input url '%s' does not include credentials",
> +		    url);

OK.  The caller shouldn't have called us in these two cases.

> +	/* Include the colon when creating the redacted URL. */
> +	colon_ptr++;
> +	strbuf_addstr(&redacted, url);
> +	strbuf_splice(&redacted, colon_ptr - url, at_ptr - colon_ptr,
> +		      "<redacted>", 10);
> +
> +	if (!strcmp("warn", value))
> +		warning(_("URL '%s' uses plaintext credentials"), redacted.buf);
> +	if (!strcmp("die", value))
> +		die(_("URL '%s' uses plaintext credentials"), redacted.buf);
> +
> +	strbuf_release(&redacted);

OK.  Let's look at the caller.

>  			if (colon_ptr) {
> +				detected_credentials_in_url(orig_url, scheme_len);
>  				passwd_off = (colon_ptr + 1) - norm.buf;
>  				passwd_len = norm.len - passwd_off;
>  				user_len = (passwd_off - 1) - (scheme_len + 3);

The caller apparently knows where the password and username are at
this point.  It is unclear which string these offsets are pointing
into at this point, but at the end of the function, we have the
offset and length of user and passwd that point into result, all in
our local variables, even when out_info is NULL.

	result = strbuf_detach(&norm, &result_len);
	if (out_info) {
		out_info->url = result;
		out_info->err = NULL;
		out_info->url_len = result_len;
		out_info->scheme_len = scheme_len;
		out_info->user_off = user_off;
		out_info->user_len = user_len;
		out_info->passwd_off = passwd_off;
		out_info->passwd_len = passwd_len;
		out_info->host_off = host_off;
		out_info->host_len = host_len;
		out_info->port_off = port_off;
		out_info->port_len = port_len;
		out_info->path_off = path_off;
		out_info->path_len = path_len;
	}
	return result;

It does make me wonder if we want another parser in the new
function.  Wouldn't it be easier to manage if we inserted a
call to 

	if (passwd_off)
		apply_fetch_credentials_in_url(result,
        			user_off, user_len, passwd_off, passwd_len);

just after the strbuf_detach() we see near the end of the function, where
apply_fetch_credentials_in_url() only reads the configuration and
calls warning or die as appropriate?