Hi Jonathan, On Wed, 22 Apr 2020, Jonathan Nieder wrote: > Johannes Schindelin wrote: > > > There was one call site, though, that really needed that leniency: when > > parsing config settings a la `credential.dev.azure.com.useHTTPPath`. > > Thanks for tackling this. > > Can the commit message say a little more about the semantics and when > someone would use this? > > Is it a shortcut for > > [credential "http://dev.azure.com"] > useHttpPath = true > > [credential "https://dev.azure.com"] > useHttpPath = true > > ? I don't really want to be too verbose about this in _this_ commit message. But I do see your point. This is my current version of the commit message: credential: optionally allow partial URLs in credential_from_url_gently() Prior to the fixes for CVE-2020-11008, we were _very_ lenient in what we required from a URL in order to parse it into a `struct credential`. That led to serious vulnerabilities. There was one call site, though, that really needed that leniency: when parsing config settings a la `credential.dev.azure.com.useHTTPPath`. Settings like this might be desired when users want to use, say, a given user name on a given host, regardless of the protocol to be used. In preparation for fixing that bug, let's add a parameter called `allow_partial_url` to the `credential_from_url_gently()` function and convert the existing callers to set that parameter to 0, i.e. do not change the existing behavior, just add the option to be more lenient. Please note that this patch does more than just reinstating a way to imitate the behavior before those CVE-2020-11008 fixes: Before that, we would simply ignore URLs without a protocol. In other words, misleadingly, the following setting would be applied to _all_ URLs: [credential "example.com"] username = that-me The obvious intention is to match the host name only. With this patch, we allow precisely that: when parsing the URL with non-zero `allow_partial_url`, we do not simply return success if there was no protocol, but we simply leave the protocol unset and continue parsing the URL. As you see, I added substantially more information there. > > In preparation for fixing that regression, let's add a parameter > > called `strict` to the `credential_from_url()` function and convert > > the existing callers to enforce that strict mode. > > I suspect this would be easier to read squashed with patch 3. That > would also mean that the functionality and test coverage come at the > same time. On the contrary, I _really_ want them to be separate, so that it is easier to review. I know that _I_ had an easier time looking over the patch that introduces the non-strict mode, to make sure that it does not change a thing in strict mode (and yes, v1 made that harder than necessary by _not_ using `strict ||` in the condition that guards the `c->host` assignment). In short: it is my intention to keep the two patches separate, with the main goal to make sure that I don't introduce any of those regressions Peff is worried about. > [...] > > diff --git a/credential.c b/credential.c > > index 64a841eddca..c73260ac40f 100644 > > --- a/credential.c > > +++ b/credential.c > > @@ -344,7 +344,7 @@ static int check_url_component(const char *url, int quiet, > > } > > > > int credential_from_url_gently(struct credential *c, const char *url, > > - int quiet) > > + int strict, int quiet) > > The collection of flags makes me wonder whether it's time to use a > single "flags" parameter with flags that are |ed together. That way, > call sites are easier to read without requiring cross-reference > assistance to see which option each boolean parameter represents. I resisted the temptation because I don't want this to be a big patch series, but a fast one. The reports keep coming in, and the current plan seems to be for GitHub Desktop to release another version on Monday, for which I need to release a MinGit backport probably tomorrow, otherwise it won't work timewise. Meaning: I don't really want to do "nice to have" work like this flag. > Alternatively, could the non-strict form be a separate public function > that uses the same static helper that takes two boolean args? That is, > something like > > int credential_from_url_gently(struct credential *c, const char *url, > int quiet) > { > return parse_credential_url(c, url, 1, quiet); > } > > int credential_from_url_nonstrict(struct credential *c, const char *url, > int quiet) > { > return parse_credential_url(c, url, 0, quiet); > } Looks good, but there are only three callers (and I don't expect more). And the only caller interested in the lenient mode (read: the only now-complicated call) lives in the very same file as the called function. So it strikes me as quite a bit of an overkill to introduce two new functions. > [...] > > @@ -357,12 +357,12 @@ int credential_from_url_gently(struct credential *c, const char *url, > > * (3) proto://<user>:<pass>@<host>/... > > */ > > proto_end = strstr(url, "://"); > > - if (!proto_end || proto_end == url) { > > + if (strict && (!proto_end || proto_end == url)) { > > if (!quiet) > > warning(_("url has no scheme: %s"), url); > > return -1; > > } > > When !strict, this means we are not requiring a protocol. No other > difference appears to be intended. Almost. My intention was to handle missing host names, too: `/repo.git` should also match `https://example.com/repo.git`. The user name is optional already, anyway. BTW I realized (while working on these patches) that it would probably be a good idea to warn about passwords in these `credential.<URL>.<key>` settings: the password will be ignored as far as `credential_match()` is concerned, and they should probably not live in the config. But for aforementioned reasons, I decided against including a patch that makes that happen. > [...] > > @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url, > > host = at + 1; > > } > > > > - c->protocol = xmemdupz(url, proto_end - url); > > - c->host = url_decode_mem(host, slash - host); > > + if (proto_end && proto_end - url > 0) > > + c->protocol = xmemdupz(url, proto_end - url); > > What should happen when the protocol isn't present? Does this mean > callers will need to be audited to make sure they handle NULL? Again, the sole caller of this lenient mode is the config parser which then wants to match the (partial) URL against the actual URL, using this function: int credential_match(const struct credential *want, const struct credential *have) { #define CHECK(x) (!want->x || (have->x && !strcmp(want->x, have->x))) return CHECK(protocol) && CHECK(host) && CHECK(path) && CHECK(username); #undef CHECK } The lenient parser is supposed to populate the `want` of this function. In other words, the code _expects_ `protocol` (or for that matter, _any_ attribute of `want`) to be optional. > > + if (slash - url > 0) > > + c->host = url_decode_mem(host, slash - host); > > What should happen the URL starts with a slash? Oh... I thought it was obvious that a partial URL starting with a slash would refer to the path part only... Does that not make sense? Ciao, Dscho