Re: [PATCH] http: add custom hostname to IP address resolves

Christian Couder <christian.couder@xxxxxxxxx> · Wed, 4 May 2022 12:07:59 +0200

On Mon, May 2, 2022 at 9:04 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > Subject: Re: [PATCH] http: add custom hostname to IP address resolves
>
> I can guess what you means, but I am not able to quite parse the
> above.  I guess the phrase that makes me hiccup when I read is
> "resolve" used as a noun.

Ok, I will use "http: add custom hostname to IP address resolutions"
in the next version then.

> > Libcurl has a CURLOPT_RESOLVE easy option that allows
> > hostname resolve information in the following form to
>
> The same here, "... allows the result of hostname resolution in the
> following format ...", perhaps?

Ok, I will use your suggestion.

> > be passed:
> >
> >       [+]HOST:PORT:ADDRESS[,ADDRESS]
> >
> > This way, redirects and everything operating against the
> > HOST+PORT will use the provided ADDRESS(s).
> >
> > The following form is also allowed to stop using these
> > resolves:
> >
> >       -HOST:PORT
>
> The above is a reasonable summary of CURLOPT_RESOLVE documentation
> that is appropriate to have here for those of us who are not
> familiar with it.  For those of us who may want to learn more, it
> would help to have an URL to the canonical documentation page, e.g.
> https://curl.se/libcurl/c/CURLOPT_RESOLVE.html but it is not
> required.  People should be able to find it easily.

Yeah, I also thought that it wasn't required, but I will add it
anyway, as I agree it could be useful and hopefully it doesn't change
very often.

> > Let's add a corresponding "http.hostResolve" config
> > option that takes advantage of CURLOPT_RESOLVE.
>
> CURLOPT_RESOLVE allows us to feed cURL a list of these <host,port>
> -> <address> mappings, so we use that mechansim to feed the values
> listed on the multi-valued configuration variable (spell it out as
> such, by the way, instead of saying "config option", which may give
> a false impression that it is a last-one-wins single string with
> many such mapping entries on it).
>
> OK.
>
> > Each value configured for the "http.hostResolve" key
> > is passed "as is" to curl through CURLOPT_RESOLVE, so it
> > should be in one of the above 2 forms. This keeps the
> > implementation simple and makes us consistent with
> > libcurl's CURLOPT_RESOLVE, and with curl's corresponding
> > `--resolve` command line option.
>
> OK.
>
> > I am not sure if some tests could/should be added. Ideas about how to
> > test this are welcome.
>
> It should.  Perhaps invent a totally bogus domain name, map that to
> localhost ::1, run a test server locally, and try to clone from that
> bogus domain?

Ok, I will add a simple test like this.

> > +http.hostResolve::
>
> Is "host" a good prefix for it?
>
> In the context of HTTP(s), if there is no other thing than host that
> we resolve, "http.resolve" may be sufficient.  For those who are
> looking for CURLOPT_RESOLVE equivalent, "http.curloptResolve" may
> make it easier to find.

I am Ok with just "http.resolve". I think using "curlopt" is perhaps
going into too many details about the implementation of the feature,
which could theoretically change if we ever decided to use something
other than curl.

> > +     Hostname resolve information that will be used first when sending
> > +     HTTP requests.  This information should be in one of the following
> > +     forms:
> > +
> > +     - [+]HOST:PORT:ADDRESS[,ADDRESS]
> > +     - -HOST:PORT
> > +
> > ++
> > +The first form redirects all requests to the given `HOST:PORT`
> > +to the provided `ADDRESS`(s). The second form clears all previous
> > +config values for that `HOST:PORT` combination.  To allow easy
> > +overriding of all the settings inherited from the system config,
> > +an empty value will reset all resolve information to the empty
> > +list.
>
> If I understand your use case correctly, this is not something you
> would want to hardcode in your configuration files for long term, is
> it?

Right, but I wonder if there are other use cases where people would
like to hardcode it, for example on a private network where IP
addresses rarely change. Also a config option makes it more consistent
with "http.extraHeaders" and other such options.

> I am wondering if we want to mention the expected use case here
> as well, something like
>
>     This is designed to be used primarily from the command line
>     configuration variable override, e.g.
>
>         $ git -c http.resolve=example.com:443:127.0.0.1 \
>             clone https://example.com/user/project.git
>
> perhaps?  Not a suggestion, but soliciting thoughts.

I am also interested in others' thoughts about this. If no one thinks
that a config option could be useful, I am Ok with making it a
"--resolve" command line option that can be passed to any Git command
similar to "-c <name>=<value>":

git --resolve=... <command> [<args>]

> > diff --git a/http.c b/http.c
> > index 229da4d148..e9cc46ee52 100644
> > --- a/http.c
> > +++ b/http.c
> > @@ -128,6 +128,9 @@ static struct curl_slist *pragma_header;
> >  static struct curl_slist *no_pragma_header;
> >  static struct string_list extra_http_headers = STRING_LIST_INIT_DUP;
> >
> > +static struct curl_slist *host_resolves;
> > +static struct string_list http_host_resolve = STRING_LIST_INIT_DUP;
> > +
> >  static struct active_request_slot *active_queue_head;
> >
> >  static char *cached_accept_language;
> > @@ -393,6 +396,17 @@ static int http_options(const char *var, const char *value, void *cb)
> >               return 0;
> >       }
> >
> > +     if (!strcmp("http.hostresolve", var)) {
> > +             if (!value) {
> > +                     return config_error_nonbool(var);
> > +             } else if (!*value) {
> > +                     string_list_clear(&http_host_resolve, 0);
>
> OK, this is a way to "clear" the list of entries accumulated on this
> multi-valued configuration variable so far.  And it is documented in
> the above, too.  Good.
>
> > +             } else {
> > +                     string_list_append(&http_host_resolve, value);
> > +             }
> > +             return 0;
> > +     }
> > +
> >       if (!strcmp("http.followredirects", var)) {
> >               if (value && !strcmp(value, "initial"))
> >                       http_follow_config = HTTP_FOLLOW_INITIAL;
> > @@ -985,6 +999,17 @@ static void set_from_env(const char **var, const char *envname)
> >               *var = val;
> >  }
> >
> > +static struct curl_slist *http_copy_host_resolve(void)
> > +{
> > +     struct curl_slist *hosts = NULL;
> > +     const struct string_list_item *item;
> > +
> > +     for_each_string_list_item(item, &http_host_resolve)
> > +             hosts = curl_slist_append(hosts, item->string);
> > +
> > +     return hosts;
> > +}
> > +
> >  void http_init(struct remote *remote, const char *url, int proactive_auth)
> >  {
> >       char *low_speed_limit;
> > @@ -1048,6 +1073,8 @@ void http_init(struct remote *remote, const char *url, int proactive_auth)
> >       no_pragma_header = curl_slist_append(http_copy_default_headers(),
> >               "Pragma:");
> >
> > +     host_resolves = http_copy_host_resolve();
>
> This is curious.  I imagined that the reason why you keep the
> original in a string_list and copy it to a curl_slist was perhaps
> because you'll lose the latter every time you make a curl request,
> but it does not appear to be the case.  You http_init() just once
> and then the same CURL *curl instance will be reused until you clear
> it with http_cleanup().  So I do not see offhand the need to have
> the string_list at all.

Ok, I will remove it.

> Does it work equally well if you used curl_slist_append() in
> http_options() and maintain ONLY the curl_slist version of the
> host_resolve list?  That would make it unnecessary to keep
> http_host_resolve and add http_copy_host_resolve() function, no?

Yeah, right. I will remove http_host_resolve and http_copy_host_resolve().

Thanks!