On Mon, Nov 08 2021, Jonathan Tan wrote: >> diff --git a/fetch-pack.c b/fetch-pack.c >> index a9604f35a3e..62ea90541c5 100644 >> --- a/fetch-pack.c >> +++ b/fetch-pack.c >> @@ -1653,8 +1653,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, >> receive_wanted_refs(&reader, sought, nr_sought); >> >> /* get the pack(s) */ >> + if (git_env_bool("GIT_TRACE_REDACT", 1)) >> + reader.options |= PACKET_READ_REDACT_URI_PATH; >> if (process_section_header(&reader, "packfile-uris", 1)) >> receive_packfile_uris(&reader, &packfile_uris); >> + reader.options &= ~PACKET_READ_REDACT_URI_PATH; > > Probably worth commenting why you're resetting the flag (avoid the > relatively expensive URI check when we don't need it). ...yeah... >> diff --git a/pkt-line.c b/pkt-line.c >> index 2dc8ac274bd..5a69ddc2e77 100644 >> --- a/pkt-line.c >> +++ b/pkt-line.c >> @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4]) >> return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2); >> } >> >> +static char *find_packfile_uri_path(const char *buffer) >> +{ >> + const char *URI_MARK = "://"; >> + char *path; >> + int len; >> + >> + /* First char is sideband mark */ >> + buffer += 1; >> + >> + len = strspn(buffer, "0123456789abcdefABCDEF"); >> + if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ') >> + return NULL; /* required "<hash>SP" not seen */ > > Optional: As I said in my reply (just sent out), checking for both SHA-1 > and SHA-256 lengths is reasonable too. > > [1] https://lore.kernel.org/git/20211108224335.569596-1-jonathantanmy@xxxxxxxxxx/ Correct me if I'm wrong, but I find it really strange that we're trying to parse things in pkt-line.c like this. We'll only get these from a client in code that's invoked in fetch-pack.c, specifically the process_section_header() quoted above, no? >From there we'll call packet_reader_read(), which will call packet_read_with_status(), and from there we'll call packet_trace(). Then right after all this happens we've got a loop that parses out these packfile URIs, including this being-done-first-here parsing of the hex value just for logging, except in pkt-line.c we've lost the information about what hash algorithm length we should be using, which fetch-pack.c of course needs to know. Why can't that process_section_header() in fetch-pack.c just be made to call some pkt-line.c API saying "don't log yet", i.e. something like this pseudocode: diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3e..31f5ee7fc6b 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1518,14 +1518,18 @@ static void receive_wanted_refs(struct packet_reader *reader, static void receive_packfile_uris(struct packet_reader *reader, struct string_list *uris) { + struct string_list log = STRING_LIST_INIT_DUP; + process_section_header(reader, "packfile-uris", 0); - while (packet_reader_read(reader) == PACKET_READ_NORMAL) { + while (packet_reader_read_log_to(reader, &log) == PACKET_READ_NORMAL) { if (reader->pktlen < the_hash_algo->hexsz || reader->line[the_hash_algo->hexsz] != ' ') die("expected '<hash> <uri>', got: %s\n", reader->line); + /* move the parsing of the URLs here */ string_list_append(uris, reader->line); } + log_stuff(&log); if (reader->status != PACKET_READ_DELIM) die("expected DELIM"); } I.e. we'll eventually call trace_strbuf() in pkt-line.c, and we know that we're doing these packfile-uris, and we know that we're just about to parse them. Let's just: 1. Start reading the section 2. Turn off tracing 3. Parse the URIs as we go 3. When done (or on the fly), scrub URIs, log any backlog suppressed trace, and turn on tracing again Instead of: 1. Set a flag to scrub stuff 2. Because of the disconnect between fetch-pack.c and pkt-line.c, effectively implement a new parser for data we're already going to be parsing some microseconds later during the course of the request. That "turn off the trace" could be passing down a string_list/strbuf, or even doing the same via a nev member in "struct packet_reader", both would be simpler than needing to re-do the parse. Probably simplest is just: struct string_list log = STRING_LIST_INIT_DUP; reader.deferred_trace = &log; /* packet_reader_read() etc. code, unchanged from now */ /* parse URIs (just move the existing code around a bit) */ packet_reader.deferred_trace = NULL; for_each...(item, &log) trace_strbuf(...);