On Tue, Oct 19 2021, Ivan Frade via GitGitGadget wrote: > From: Ivan Frade <ifrade@xxxxxxxxxx> > > In some setups, packfile uris act as bearer token. It is not > recommended to expose them plainly in logs, although in special > circunstances (e.g. debug) it makes sense to write them. > > Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT > variable is set to false. This mimics the redacting of the Authorization > header in HTTP. > > Signed-off-by: Ivan Frade <ifrade@xxxxxxxxxx> > --- > fetch-pack: redact packfile urls in traces > > Changes since v1: Just context for other reviewers: s/Changes since v1/Changes since v2/ I see, from the context of https://lore.kernel.org/git/pull.1052.v2.git.1633746024175.gitgitgadget@xxxxxxxxx/ > * Redact only the path of the URL > * Test are now strict, validating the exact line expected in the log And this was changed in v2... > Changes since v1: > > * Removed non-POSIX flags in tests > * More accurate regex for the non-encrypted packfile line [...] > * Dropped documentation change > * Dropped redacting the die message in http-fetch Since both of those were done I think in response to my feedback I just want to clarify (if needed): * That the documentation change is still good to have, although I had feedback on fixing that more generally in the protocol v2 docs. It would be great if you could still pursue it (and that I didn't discourage you from doing so). * I think having this redaction in die() could still be valuable, e.g. your packfile-uri's start failing, and now users are copy/pasting "private" URLs that contain their passwords or whatever to try to get help, that would be bad. But perhaps if you don't have private URLs redacting them unconditionally would slow down debugging for some, i.e. you have 10x pasted URLs, and all the errors are from one set of servers (although your current redaction includes the hostname, which I think would address most cases of say one CDN node failing). It was really just a comment that your v1's commit message didn't mention or justify it, but just having it make a mention of it would also be an OK solution, or fold that into another patch or whatever... > { > + int saved_options; > process_section_header(reader, "packfile-uris", 0); > + /* > + * In some setups, packfile-uris act as bearer tokens, > + * redact them by default. > + */ > + saved_options = reader->options; nit: no need to pre-declare "int saved_options" here, just move this & the comment above "process_section_header" (in this case I'd say a comment isn't even needed, obvious from context...), or it should be at the definition of PACKET_READ_REDACT_URL_PATH... (more below) > + if (git_env_bool("GIT_TRACE_REDACT", 1)) If we're going to use GIT_TRACE_REDACT for this the documentation needs updating: Documentation/git.txt:`GIT_TRACE_REDACT`:: Documentation/git.txt- By default, when tracing is activated, Git redacts the values of Documentation/git.txt- cookies, the "Authorization:" header, and the "Proxy-Authorization:" Documentation/git.txt- header. Set this variable to `0` to prevent this redaction. > + reader->options |= PACKET_READ_REDACT_URL_PATH; (continued)... but that was from a really narrow reading of the code, I think this whole flip-flopping of options back and forth isn't needed at all, and you should just assign this flag at the top of do_fetch_pack_v2(), no? The setting of it also looks like it belongs with the reading of "GIT_TEST_SIDEBAND_ALL". I.e. nothing else uses PACKET_READ_REDACT_URL_PATH, why do we need to be flipping it back & forth? Keeping these flags in the "reader" is what that member is for, isn't it? Maybe I'm missing something. > +static int find_url_path_start(const char* buffer) > +{ > + const char *URL_MARK = "://"; > + char *p = strstr(buffer, URL_MARK); > + if (!p) { > + return -1; > + } > + > + p += strlen(URL_MARK); > + while (*p && *p != '/') > + p++; > + > + // Position after '/' > + if (*p && *(p + 1)) > + return (p + 1) - buffer; > + > + return -1; > +} I think that packfile URI only supports http:// and https://, not file:// or whatever, so I wonder if either curl or we have a helper function for this that we can use....(more below) > enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > size_t *src_len, char *buffer, > unsigned size, int *pktlen, > @@ -393,6 +412,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > { > int len; > char linelen[4]; > + int url_path_start; > > if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) { > *pktlen = -1; > @@ -443,7 +463,18 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > len--; > > buffer[len] = 0; > - packet_trace(buffer, len, 0); > + if (options & PACKET_READ_REDACT_URL_PATH && > + (url_path_start = find_url_path_start(buffer)) != -1) { > + const char *redacted = "<redacted>"; > + struct strbuf tracebuf = STRBUF_INIT; > + strbuf_insert(&tracebuf, 0, buffer, len); > + strbuf_splice(&tracebuf, url_path_start, > + len - url_path_start, redacted, strlen(redacted)); > + packet_trace(tracebuf.buf, tracebuf.len, 0); > + strbuf_release(&tracebuf); > + } else { > + packet_trace(buffer, len, 0); > + } ...If we're redacting the URL isn't (and this might be less code with a helper function) saying: failed to get 'https' url from 'somehost.example.com' (full URL redacted due to XYZ setting) Friendlier than something like (which this function sets up): failed to get 'https://somehost.example.com/<redacted>' url I.e. it allows us to use a get_schema_from_url() and get_host_from_url() functions (I don't know if/where we have those, but seems likelier than "find path url boundary" (or maybe I'm wrong and we always feed those as-is to curl et al).