Re: [PATCH v7 06/12] test-http-server: add HTTP request parsing

Jeff King <peff@xxxxxxxx> · Thu, 26 Jan 2023 04:30:05 -0500

On Fri, Jan 20, 2023 at 10:08:44PM +0000, Matthew John Cheetham via GitGitGadget wrote:

> +#define REQ__INIT { \
> +     .start_line = STRBUF_INIT, \
> +     .uri_path = STRBUF_INIT, \
> +     .query_args = STRBUF_INIT, \
> +     .header_list = STRING_LIST_INIT_NODUP, \
> +     .content_type = NULL, \
> +     .content_length = 0, \
> +     .has_content_length = 0, \
> +}

We declare header_list as nodup, but later we put actual duplicated
strings in it:

> +             hp = strbuf_detach(&h, NULL);
> +             string_list_append(&req->header_list, hp);

So later when we free it:

> +static void req__release(struct req *req)
> +{
> +     strbuf_release(&req->start_line);
> +
> +     strbuf_release(&req->uri_path);
> +     strbuf_release(&req->query_args);
> +
> +     string_list_clear(&req->header_list, 0);
> +}

the strings will be leaked. There are a lot of solutions here, including
setting strdup_strings right before freeing. But it's probably
reasonable to just use INIT_DUP, and then when storing, just do:

  string_list_append(&req->header_list, h.buf);

Since "h" is filled by strbuf_getwholeline(), there's no need to erase
the contents. It should reset the buffer itself (and so you end up
re-using the same buffer, rather than freeing it for each loop).  You
will have to remember to strbuf_release() after the loop, though.

The leak isn't very big, and we hold onto it until the process ends
anyway, but it will probably cause the leak-detector to complain.

(Yet another solution would just be to dump the trace as we parse the
headers, rather than holding them, since that appears to be the only use
of header_list).

> +	/*
> +	 * Read the set of HTTP headers into a string-list.
> +	 */
> +	while (1) {
> +		if (strbuf_getwholeline_fd(&h, fd, '\n') == EOF)
> +			goto done;
> +		strbuf_trim_trailing_newline(&h);
> +
> +		if (!h.len)
> +			goto done; /* a blank line ends the header */
> +
> +		hp = strbuf_detach(&h, NULL);
> +		string_list_append(&req->header_list, hp);
> +
> +		/* also store common request headers as struct req members */
> +		if (skip_iprefix(hp, "Content-Type: ", &hv)) {
> +			req->content_type = hv;

I think this is stricter than necessary. The whitespace after the colon
is optional, but can also be longer than just one space (or could be a
tab). It's probably OK to be picky here since this is just for tests,
but we'd want to make sure we're not this picky on the client side.

> +		} else if (skip_iprefix(hp, "Content-Length: ", &hv)) {
> +			/*
> +			 * Content-Length is always non-negative, but has no
> +			 * upper bound according to RFC 7230 (§3.3.2).
> +			 */
> +			intmax_t len = 0;
> +			if (sscanf(hv, "%"PRIdMAX, &len) != 1 || len < 0 ||
> +			    len == INTMAX_MAX) {
> +				logerror("invalid content-length: '%s'", hv);
> +				result = WR_CLIENT_ERROR;
> +				goto done;
> +			}

We usually avoid sscanf because it's error-checking sucks. For example,
this will accept "123.garbage", but you can't tell because you have no
clue how far it got.  Something like strtoimax() is better. It probably
doesn't matter much since this is test code, though I do think in the
long run it would be nice to add scanf(), etc, to our list of banned
functions (there are one or two other uses currently, though, so that
isn't imminent).

-Peff