Re: [PATCH v3] fetch-pack: redact packfile urls in traces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 19 2021, Ivan Frade via GitGitGadget wrote:

> From: Ivan Frade <ifrade@xxxxxxxxxx>
>
> In some setups, packfile uris act as bearer token. It is not
> recommended to expose them plainly in logs, although in special
> circunstances (e.g. debug) it makes sense to write them.
>
> Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT
> variable is set to false. This mimics the redacting of the Authorization
> header in HTTP.
>
> Signed-off-by: Ivan Frade <ifrade@xxxxxxxxxx>
> ---
>     fetch-pack: redact packfile urls in traces
>     
>     Changes since v1:

Just context for other reviewers:

s/Changes since v1/Changes since v2/ I see, from the context of
https://lore.kernel.org/git/pull.1052.v2.git.1633746024175.gitgitgadget@xxxxxxxxx/

>      * Redact only the path of the URL
>      * Test are now strict, validating the exact line expected in the log

And this was changed in v2...

>     Changes since v1:
>     
>      * Removed non-POSIX flags in tests
>      * More accurate regex for the non-encrypted packfile line

[...]

>      * Dropped documentation change
>      * Dropped redacting the die message in http-fetch

Since both of those were done I think in response to my feedback I just
want to clarify (if needed):

 * That the documentation change is still good to have, although I had
   feedback on fixing that more generally in the protocol v2 docs. It
   would be great if you could still pursue it (and that I didn't
   discourage you from doing so).

 * I think having this redaction in die() could still be valuable,
   e.g. your packfile-uri's start failing, and now users are
   copy/pasting "private" URLs that contain their passwords or whatever
   to try to get help, that would be bad.   

   But perhaps if you don't have private URLs redacting them
   unconditionally would slow down debugging for some, i.e. you have 10x
   pasted URLs, and all the errors are from one set of servers (although
   your current redaction includes the hostname, which I think would
   address most cases of say one CDN node failing).

   It was really just a comment that your v1's commit message didn't
   mention or justify it, but just having it make a mention of it would
   also be an OK solution, or fold that into another patch or
   whatever...

>  {
> +	int saved_options;
>  	process_section_header(reader, "packfile-uris", 0);
> +	/*
> +	 * In some setups, packfile-uris act as bearer tokens,
> +	 * redact them by default.
> +	 */
> +	saved_options = reader->options;

nit: no need to pre-declare "int saved_options" here, just move this &
the comment above "process_section_header" (in this case I'd say a
comment isn't even needed, obvious from context...), or it should be at
the definition of PACKET_READ_REDACT_URL_PATH... (more below)

> +	if (git_env_bool("GIT_TRACE_REDACT", 1))

If we're going to use GIT_TRACE_REDACT for this the documentation needs updating:

Documentation/git.txt:`GIT_TRACE_REDACT`::
Documentation/git.txt-  By default, when tracing is activated, Git redacts the values of
Documentation/git.txt-  cookies, the "Authorization:" header, and the "Proxy-Authorization:"
Documentation/git.txt-  header. Set this variable to `0` to prevent this redaction.

> +		reader->options |= PACKET_READ_REDACT_URL_PATH;

(continued)... but that was from a really narrow reading of the code, I
think this whole flip-flopping of options back and forth isn't needed at
all, and you should just assign this flag at the top of
do_fetch_pack_v2(), no?  The setting of it also looks like it belongs
with the reading of "GIT_TEST_SIDEBAND_ALL".

I.e. nothing else uses PACKET_READ_REDACT_URL_PATH, why do we need to be
flipping it back & forth? Keeping these flags in the "reader" is what
that member is for, isn't it? Maybe I'm missing something.

> +static int find_url_path_start(const char* buffer)
> +{
> +	const char *URL_MARK = "://";
> +	char *p = strstr(buffer, URL_MARK);
> +	if (!p) {
> +		return -1;
> +	}
> +
> +	p += strlen(URL_MARK);
> +	while (*p && *p != '/')
> +		p++;
> +
> +	// Position after '/'
> +	if (*p && *(p + 1))
> +		return (p + 1) - buffer;
> +
> +	return -1;
> +}

I think that packfile URI only supports http:// and https://, not
file:// or whatever, so I wonder if either curl or we have a helper
function for this that we can use....(more below)

>  enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  						size_t *src_len, char *buffer,
>  						unsigned size, int *pktlen,
> @@ -393,6 +412,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  {
>  	int len;
>  	char linelen[4];
> +	int url_path_start;
>  
>  	if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) {
>  		*pktlen = -1;
> @@ -443,7 +463,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  		len--;
>  
>  	buffer[len] = 0;
> -	packet_trace(buffer, len, 0);
> +	if (options & PACKET_READ_REDACT_URL_PATH &&
> +	    (url_path_start = find_url_path_start(buffer)) != -1) {
> +		const char *redacted = "<redacted>";
> +		struct strbuf tracebuf = STRBUF_INIT;
> +		strbuf_insert(&tracebuf, 0, buffer, len);
> +		strbuf_splice(&tracebuf, url_path_start,
> +			      len - url_path_start, redacted, strlen(redacted));
> +		packet_trace(tracebuf.buf, tracebuf.len, 0);
> +		strbuf_release(&tracebuf);
> +	} else {
> +		packet_trace(buffer, len, 0);
> +	}

...If we're redacting the URL isn't (and this might be less code with a
helper function) saying:

    failed to get 'https' url from 'somehost.example.com' (full URL redacted due to XYZ setting)

Friendlier than something like (which this function sets up):

    failed to get 'https://somehost.example.com/<redacted>' url

I.e. it allows us to use a get_schema_from_url() and get_host_from_url()
functions (I don't know if/where we have those, but seems likelier than
"find path url boundary" (or maybe I'm wrong and we always feed those
as-is to curl et al).



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux