On Wed, Oct 27, 2021 at 6:01 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Ivan Frade via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > From: Ivan Frade <ifrade@xxxxxxxxxx> > > > Just a curiosity. Do we call these packfile URI, or packfile URL? The feature is "packfile URI" (and the section is called so in the protocol). I changed all "url" to "uri". > > diff --git a/pkt-line.c b/pkt-line.c > > index 2dc8ac274bd..ba0a2d65f0c 100644 > > --- a/pkt-line.c > > +++ b/pkt-line.c > > @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) > > return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); > > } > > > > +static char *find_url_path(const char* buffer, int *path_len) > > +{ > > + const char *URL_MARK = "://"; > > + char *path = strstr(buffer, URL_MARK); > > + if (!path) > > + return NULL; > > Hmph, the format we expect is "<hash> <uri>"; don't we need to > validate the leading <hash> followed by SP? I was trying to find a uri in a packet in general, not counting on the packfile-uri line format. That is probably an overgeneralization. Next patch version follows these suggestions to look for a packfile-uri line. > > + if (path_len) { > > + char *url_end = strchrnul(path, ' '); > > Is this because SP is not a valid character in packfile URI, or at > this point in the callchain it would be encoded or something? The > format we expect is "<hash> <uri>", so we shouldn't even have to > look for SP but just redact everything to the end, no? Yes, now that we count on the packfile-uri line format, we can redact everything to the end and there is no need to return the length. > > - packet_trace(buffer, len, 0); > > + if (options & PACKET_READ_REDACT_URL_PATH && > > + (url_path_start = find_url_path(buffer, &url_path_len))) { > > + const char *redacted = "<redacted>"; > > + struct strbuf tracebuf = STRBUF_INIT; > > + strbuf_insert(&tracebuf, 0, buffer, len); > > + strbuf_splice(&tracebuf, url_path_start - buffer, > > + url_path_len, redacted, strlen(redacted)); > > + packet_trace(tracebuf.buf, tracebuf.len, 0); > > + strbuf_release(&tracebuf); > > I briefly wondered if the repeated allocation (and more > fundamentally, preparing the redacted copy of packet whether we are > actually tracing the packet in the first place) is blindly wasting > the resources too much, but this only happens in the protocol header > part, so it might be OK. We only allocate and redact if it looks like a packfile-uri line, so it shouldn't happen too frequently. > Even if that is not the case, we should be able to update > fetch_pack.c::do_fetch_pack_v2() so that the REDACT_URL_PATH bit is > turned on in a much narrower region of code, right? Enable when we > enter the GET_PACK state and drop the bit when we are done with the > packfile URI packets, or something? I move the set/unset of the redacting flag to the FETCH_GET_PACK around the "packfile-uris" section. There is no need to check every incoming packet for a packfile-uri line, we know when they should come. Thanks, Ivan