Re: [PATCH v3] config: add support for http.<url>.* settings

Jeff King <peff@xxxxxxxx> · Mon, 15 Jul 2013 01:06:54 -0400

On Sat, Jul 13, 2013 at 12:46:17PM -0700, Kyle J. McKay wrote:

> I expect it will be easier just to normalize the URL without
> splitting.  That is, lowercase the parts that are case-insensitive
> (scheme and host name) and adjust the URL-escaping to remove URL
> escaping (%xx) from characters that don't need it but add it to any
> for which it is required that are not escaped (according to RFC
> 1738).

I think you are suggesting doing better than this, but just to be clear,
we cannot treat the URL as a simple string and just decode and
re-encode.

One of the things that gets encoded are the delimiting characters. So if
I have the URL:

  https://foo%3abar@xxxxxxxxxxx

you would "canonicalize" it into:

  https://foo:bar@xxxxxxxxxxx

But those are two different URLs entirely; the first has the username
"foo:bar", and the second has the username "foo" and the password "bar".

I admit that these are unlikely to come up in practice, but I am worried
that there is some room for mischief here. For example:

  https://example.com%2ftricky.host/repo.git

If we canonicalize that into:

  https://example.com/tricky.host/repo.git

and do a lookup, we think we are hitting example.com, but we are
actually hitting example.comtricky.host (i.e., that is how curl will
interpret it).  If we were deciding to use a stored credential based on
that information, it would be quite bad (we would leak credentials to
the owner of comtricky.host). I know your patch does not impact the
credential lookup behavior, but it would be nice in the long run if the
two lookups followed the same rules.

So I think the three options are basically:

  1. No decoding, require the user to use a consistent prefix between
     config and other uses of the URL. I.e., your current patch. The
     downside is that it doesn't handle any variation of input.

  2. Full decoding into constituent parts. This handles canonicalization
     of encoding, and also allows "wildcard" components (e.g., a URL
     with username can match the generic "https://example.com"; in the
     config). The downside is that you cannot do a "longest prefix wins"
     rule for overriding.

  3. Full decoding as in (2), but then re-assemble into a canonicalized
     encoded URL. The upside is that you get to do "longest prefix
     wins", but you can no longer have wildcard components. I think this
     is what you are suggesting in your mail.

I'm still in favor of (2), because I think the wildcard components are
important (and while I agree that the "longest prefix wins" is nicer, we
already have "last one wins" for the rest of the config, including the
credential URL matcher). But I certainly think (3) is better than (1).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html