Re: [PATCH v4 1/2] fetch-pack: redact packfile urls in traces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 27, 2021 at 6:01 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Ivan Frade via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > From: Ivan Frade <ifrade@xxxxxxxxxx>
> >

> Just a curiosity.  Do we call these packfile URI, or packfile URL?

The feature is "packfile URI" (and the section is called so in the
protocol). I changed all "url" to "uri".

> > diff --git a/pkt-line.c b/pkt-line.c
> > index 2dc8ac274bd..ba0a2d65f0c 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -370,6 +370,31 @@ int packet_length(const char lenbuf_hex[4])
> >       return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2);
> >  }
> >
> > +static char *find_url_path(const char* buffer, int *path_len)
> > +{
> > +     const char *URL_MARK = "://";
> > +     char *path = strstr(buffer, URL_MARK);
> > +     if (!path)
> > +             return NULL;
>
> Hmph, the format we expect is "<hash> <uri>"; don't we need to
> validate the leading <hash> followed by SP?

I was trying to find a uri in a packet in general, not counting on the
packfile-uri line format. That is probably an overgeneralization.

Next patch version follows these suggestions to look for a packfile-uri line.

> > +     if (path_len) {
> > +             char *url_end = strchrnul(path, ' ');
>
> Is this because SP is not a valid character in packfile URI, or at
> this point in the callchain it would be encoded or something?  The
> format we expect is "<hash> <uri>", so we shouldn't even have to
> look for SP but just redact everything to the end, no?

Yes, now that we count on the packfile-uri line format, we can redact
everything to the end and there is no need to return the length.

> > -     packet_trace(buffer, len, 0);
> > +     if (options & PACKET_READ_REDACT_URL_PATH &&
> > +         (url_path_start = find_url_path(buffer, &url_path_len))) {
> > +             const char *redacted = "<redacted>";
> > +             struct strbuf tracebuf = STRBUF_INIT;
> > +             strbuf_insert(&tracebuf, 0, buffer, len);
> > +             strbuf_splice(&tracebuf, url_path_start - buffer,
> > +                           url_path_len, redacted, strlen(redacted));
> > +             packet_trace(tracebuf.buf, tracebuf.len, 0);
> > +             strbuf_release(&tracebuf);
>
> I briefly wondered if the repeated allocation (and more
> fundamentally, preparing the redacted copy of packet whether we are
> actually tracing the packet in the first place) is blindly wasting
> the resources too much, but this only happens in the protocol header
> part, so it might be OK.

We only allocate and redact if it looks like a packfile-uri line, so
it shouldn't happen too frequently.

> Even if that is not the case, we should be able to update
> fetch_pack.c::do_fetch_pack_v2() so that the REDACT_URL_PATH bit is
> turned on in a much narrower region of code, right?  Enable when we
> enter the GET_PACK state and drop the bit when we are done with the
> packfile URI packets, or something?

I move the set/unset of the redacting flag to the FETCH_GET_PACK
around the "packfile-uris" section.
There is no need to check every incoming packet for a packfile-uri
line, we know when they should come.

Thanks,

Ivan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux