Patrick Steinhardt <patrick.steinhardt@xxxxxxxx> writes: > The URL matching function computes for two URLs whether they match not. > The match is performed by splitting up the URL into different parts and > then doing an exact comparison with the to-be-matched URL. > > The main user of `urlmatch` is the configuration subsystem. It allows to > set certain configurations based on the URL which is being connected to > via keys like `http.<url>.*`. A common use case for this is to set > proxies for only some remotes which match the given URL. Unfortunately, > having exact matches for all parts of the URL can become quite tedious > in some setups. Imagine for example a corporate network where there are > dozens or even hundreds of subdomains, which would have to be configured > individually. > > This commit introduces the ability to use globbing in the host-part of > the URLs. A user can simply specify a `*` as part of the host name to > match all subdomains at this level. For example adding a configuration > key `http.https://*.example.com.proxy` will match all subdomains of > `https://example.com`. This is probably a useful improvement. Having said that, when I mentioned "glob", I meant to also support something like this: https://www[1-4].ibm.com/ And when people read "glob", that is what they expect. So calling this "the ability to use globbing" is misleading. The last paragraph in the log message above needs a bit of tweaking, perhaps like this: Allow users to write an asterisk '*' in place of any 'host' or 'subdomain' label as part of the host name. For example, "http.https://*.example.com.proxy" sets "http.proxy" for all direct subdomains of "https://example.com", e.g. "https://foo.example.com", but not "https://foo.bar.example.com". Fortunately, your update to config.txt, which is facing the end users, does not misuse the word and instead is explicit that the only thing the matcher does is to match '*' to a single hierarchy. It is clear that even http://www*.ibm.com/ is not supported from the description, which is good. > . Host/domain name (e.g., `example.com` in `https://example.com/`). > - This field must match exactly between the config key and the URL. > + This field must match between the config key and the URL. It is > + possible to specify a `*` as part of the host name to match all subdomains > + at this level. `https://*.example.com/` for example would match > + `https://foo.example.com/`, but not `https://foo.bar.example.com/`. This is good as-is. > . Port number (e.g., `8080` in `http://example.com:8080/`). > This field must match exactly between the config key and the URL. > diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh > index 923bfc5a2..ec545e092 100755 > --- a/t/t1300-repo-config.sh > +++ b/t/t1300-repo-config.sh > @@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' ' > test_cmp expect actual > ' > > +test_expect_success 'glob-based urlmatch' ' This is not "glob". A more generic term "wildcard" is OK. > + cat >.git/config <<-\EOF && > + [http] > + sslVerify > ... > +static int match_host(const struct url_info *url_info, > + const struct url_info *pattern_info) > +{ > + char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len); > + char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len); > + char *url_tok, *pat_tok, *url_save, *pat_save; > + int matching; > + > + url_tok = strtok_r(url, ".", &url_save); > + pat_tok = strtok_r(pat, ".", &pat_save); Hmph, this will be the first use of strtok_r() in our codebase. Does everybody have it? For a use like this where your delimiter set is a singleton, it may be simpler to do the usual strchrnul() or memchr() based loop. The attached is my attempt to do so on top of this patch. > + > + for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save), > + pat_tok = strtok_r(NULL, ".", &pat_save)) { > + if (!strcmp(pat_tok, "*")) > + continue; /* a simple glob matches everything */ s/glob/asterisk/ Other than that, the patch looks OK. diff --git a/urlmatch.c b/urlmatch.c index 53ff972a60..8dfc7fd28a 100644 --- a/urlmatch.c +++ b/urlmatch.c @@ -63,36 +63,47 @@ static int append_normalized_escapes(struct strbuf *buf, return 1; } +static const char *end_of_token(const char *s, int c, size_t n) +{ + const char *next = memchr(s, c, n); + if (!next) + next = s + n; + return next; +} + static int match_host(const struct url_info *url_info, const struct url_info *pattern_info) { - char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len); - char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len); - char *url_tok, *pat_tok, *url_save, *pat_save; - int matching; - - url_tok = strtok_r(url, ".", &url_save); - pat_tok = strtok_r(pat, ".", &pat_save); - - for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save), - pat_tok = strtok_r(NULL, ".", &pat_save)) { - if (!strcmp(pat_tok, "*")) - continue; /* a simple glob matches everything */ - - if (strcmp(url_tok, pat_tok)) { - /* subdomains do not match */ - matching = 0; - break; - } + const char *url = url_info->url + url_info->host_off; + const char *pat = pattern_info->url + pattern_info->host_off; + int url_len = url_info->host_len; + int pat_len = pattern_info->host_len; + + while (url_len && pat_len) { + const char *url_next = end_of_token(url, '.', url_len); + const char *pat_next = end_of_token(pat, '.', pat_len); + + if (pat_next == pat + 1 && pat[0] == '*') + /* wildcard matches anything */ + ; + else if ((pat_next - pat) == (url_next - url) && + !memcmp(url, pat, url_next - url)) + /* the components are the same */ + ; + else + return 0; /* found an unmatch */ + + if (url_next < url + url_len) + url_next++; + url_len -= url_next - url; + url = url_next; + if (pat_next < pat + pat_len) + pat_next++; + pat_len -= pat_next - pat; + pat = pat_next; } - /* matching if both URL and pattern are at their ends */ - matching = (url_tok == NULL && pat_tok == NULL); - - free(url); - free(pat); - - return matching; + return 1; } static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)