Re: [PATCH v6 1/2] fetch-pack: redact packfile urls in traces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 08 2021, Jonathan Tan wrote:

>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index a9604f35a3e..62ea90541c5 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -1653,8 +1653,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>>  				receive_wanted_refs(&reader, sought, nr_sought);
>>  
>>  			/* get the pack(s) */
>> +			if (git_env_bool("GIT_TRACE_REDACT", 1))
>> +				reader.options |= PACKET_READ_REDACT_URI_PATH;
>>  			if (process_section_header(&reader, "packfile-uris", 1))
>>  				receive_packfile_uris(&reader, &packfile_uris);
>> +			reader.options &= ~PACKET_READ_REDACT_URI_PATH;
>
> Probably worth commenting why you're resetting the flag (avoid the
> relatively expensive URI check when we don't need it).

...yeah...

>> diff --git a/pkt-line.c b/pkt-line.c
>> index 2dc8ac274bd..5a69ddc2e77 100644
>> --- a/pkt-line.c
>> +++ b/pkt-line.c
>> @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4])
>>  	return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2);
>>  }
>>  
>> +static char *find_packfile_uri_path(const char *buffer)
>> +{
>> +	const char *URI_MARK = "://";
>> +	char *path;
>> +	int len;
>> +
>> +	/* First char is sideband mark */
>> +	buffer += 1;
>> +
>> +	len = strspn(buffer, "0123456789abcdefABCDEF");
>> +	if (len != (int)the_hash_algo->hexsz || buffer[len] != ' ')
>> +		return NULL; /* required "<hash>SP" not seen */
>
> Optional: As I said in my reply (just sent out), checking for both SHA-1
> and SHA-256 lengths is reasonable too.
>
> [1] https://lore.kernel.org/git/20211108224335.569596-1-jonathantanmy@xxxxxxxxxx/

Correct me if I'm wrong, but I find it really strange that we're trying
to parse things in pkt-line.c like this.

We'll only get these from a client in code that's invoked in
fetch-pack.c, specifically the process_section_header() quoted above,
no?

>From there we'll call packet_reader_read(), which will call
packet_read_with_status(), and from there we'll call packet_trace().

Then right after all this happens we've got a loop that parses out these
packfile URIs, including this being-done-first-here parsing of the hex
value just for logging, except in pkt-line.c we've lost the information
about what hash algorithm length we should be using, which fetch-pack.c
of course needs to know.

Why can't that process_section_header() in fetch-pack.c just be made to
call some pkt-line.c API saying "don't log yet", i.e. something like
this pseudocode:

diff --git a/fetch-pack.c b/fetch-pack.c
index a9604f35a3e..31f5ee7fc6b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1518,14 +1518,18 @@ static void receive_wanted_refs(struct packet_reader *reader,
 static void receive_packfile_uris(struct packet_reader *reader,
                                  struct string_list *uris)
 {
+       struct string_list log = STRING_LIST_INIT_DUP;
+
        process_section_header(reader, "packfile-uris", 0);
-       while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+       while (packet_reader_read_log_to(reader, &log) == PACKET_READ_NORMAL) {
                if (reader->pktlen < the_hash_algo->hexsz ||
                    reader->line[the_hash_algo->hexsz] != ' ')
                        die("expected '<hash> <uri>', got: %s\n", reader->line);
 
+               /* move the parsing of the URLs here */
                string_list_append(uris, reader->line);
        }
+       log_stuff(&log);
        if (reader->status != PACKET_READ_DELIM)
                die("expected DELIM");
 }

I.e. we'll eventually call trace_strbuf() in pkt-line.c, and we know
that we're doing these packfile-uris, and we know that we're just about
to parse them. Let's just:

 1. Start reading the section
 2. Turn off tracing
 3. Parse the URIs as we go
 3. When done (or on the fly), scrub URIs, log any backlog suppressed trace, and turn on tracing again

Instead of:

 1. Set a flag to scrub stuff
 2. Because of the disconnect between fetch-pack.c and pkt-line.c,
    effectively implement a new parser for data we're already going to be
    parsing some microseconds later during the course of the request.

That "turn off the trace" could be passing down a string_list/strbuf, or
even doing the same via a nev member in "struct packet_reader", both
would be simpler than needing to re-do the parse. Probably simplest is just:

    struct string_list log = STRING_LIST_INIT_DUP;

    reader.deferred_trace = &log;
    /* packet_reader_read() etc. code, unchanged from now */
    /* parse URIs (just move the existing code around a bit) */
    packet_reader.deferred_trace = NULL;
    for_each...(item, &log)
        trace_strbuf(...);



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux