On Sat, Jul 13, 2013 at 12:46:17PM -0700, Kyle J. McKay wrote: > I expect it will be easier just to normalize the URL without > splitting. That is, lowercase the parts that are case-insensitive > (scheme and host name) and adjust the URL-escaping to remove URL > escaping (%xx) from characters that don't need it but add it to any > for which it is required that are not escaped (according to RFC > 1738). I think you are suggesting doing better than this, but just to be clear, we cannot treat the URL as a simple string and just decode and re-encode. One of the things that gets encoded are the delimiting characters. So if I have the URL: https://foo%3abar@xxxxxxxxxxx you would "canonicalize" it into: https://foo:bar@xxxxxxxxxxx But those are two different URLs entirely; the first has the username "foo:bar", and the second has the username "foo" and the password "bar". I admit that these are unlikely to come up in practice, but I am worried that there is some room for mischief here. For example: https://example.com%2ftricky.host/repo.git If we canonicalize that into: https://example.com/tricky.host/repo.git and do a lookup, we think we are hitting example.com, but we are actually hitting example.comtricky.host (i.e., that is how curl will interpret it). If we were deciding to use a stored credential based on that information, it would be quite bad (we would leak credentials to the owner of comtricky.host). I know your patch does not impact the credential lookup behavior, but it would be nice in the long run if the two lookups followed the same rules. So I think the three options are basically: 1. No decoding, require the user to use a consistent prefix between config and other uses of the URL. I.e., your current patch. The downside is that it doesn't handle any variation of input. 2. Full decoding into constituent parts. This handles canonicalization of encoding, and also allows "wildcard" components (e.g., a URL with username can match the generic "https://example.com" in the config). The downside is that you cannot do a "longest prefix wins" rule for overriding. 3. Full decoding as in (2), but then re-assemble into a canonicalized encoded URL. The upside is that you get to do "longest prefix wins", but you can no longer have wildcard components. I think this is what you are suggesting in your mail. I'm still in favor of (2), because I think the wildcard components are important (and while I agree that the "longest prefix wins" is nicer, we already have "last one wins" for the rest of the config, including the credential URL matcher). But I certainly think (3) is better than (1). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html