On 2023-02-15 15:26, Junio C Hamano wrote: > "Matthew John Cheetham via GitGitGadget" <gitgitgadget@xxxxxxxxx> > writes: > >> According to RFC2616 Section 4.2 [1], header field names are not >> case-sensitive meaning when collecting multiple values for the same >> field name, we can just use the case of the first observed instance of >> each field name and no normalisation is required. > > If the names are not case-sensitive, you can choose to first > downcase the names you see, and use that consistently, and the > result would still be valid. IOW, "not case-sensitive" does not at > all mean you have to use the first observed instance without > normalization. You are allowed to choose such an implementation, > but "not case-sensitive" is not a justification to choose such an > implementation among possible implementation that would be allowed > under the rule. Re-reading this paragraph, it doens't really need to even be here. This was an artefact of a time when I was storing all headers, including keys and values. Since we're only interested now in the WWW-Authenticate header _values_, there's no need to call out this out. Will drop this paragraph. >> The collection of all header values matching the WWW-Authenticate >> header is complicated by the fact that it is legal for header fields to >> be continued over multiple lines, but libcurl only gives us one line at >> a time. > > Saying "one physical line" at a time may make it clear what you are > pointing out as a weak point in the interface libcURL gives us (I > think you are getting at "if they handled header folding for us and > fed us one logical line at a time, it would have been nicer"). Logical header fields vs physical header lines is useful and clearer terminology - I will update the commit message to reflect in the next iteration. Thanks! >> @@ -22,6 +22,7 @@ void credential_clear(struct credential *c) >> free(c->username); >> free(c->password); >> string_list_clear(&c->helpers, 0); >> + strvec_clear(&c->wwwauth_headers); >> >> credential_init(c); >> } >> diff --git a/credential.h b/credential.h >> index f430e77fea4..6f2e5bc610b 100644 >> --- a/credential.h >> +++ b/credential.h >> @@ -2,6 +2,7 @@ >> #define CREDENTIAL_H >> >> #include "string-list.h" >> +#include "strvec.h" >> >> /** >> * The credentials API provides an abstracted way of gathering username and >> @@ -115,6 +116,19 @@ struct credential { >> */ >> struct string_list helpers; >> >> + /** >> + * A `strvec` of WWW-Authenticate header values. Each string >> + * is the value of a WWW-Authenticate header in an HTTP response, >> + * in the order they were received in the response. >> + */ >> + struct strvec wwwauth_headers; >> + >> + /** >> + * Internal use only. Used to keep track of split header fields > > The technical term for what you call "split header" here seems to be > "line folding" (RFC 7230, which deprecates it). > >> + * in order to fold multiple lines into one value. >> + */ >> + unsigned header_is_last_match:1; >> + >> unsigned approved:1, >> configured:1, >> quit:1, >> @@ -130,6 +144,7 @@ struct credential { >> >> #define CREDENTIAL_INIT { \ >> .helpers = STRING_LIST_INIT_DUP, \ >> + .wwwauth_headers = STRVEC_INIT, \ >> } >> >> /* Initialize a credential structure, setting all fields to empty. */ >> diff --git a/git-compat-util.h b/git-compat-util.h >> index a76d0526f79..a59230564e8 100644 >> --- a/git-compat-util.h >> +++ b/git-compat-util.h >> @@ -1266,6 +1266,29 @@ static inline int skip_iprefix(const char *str, const char *prefix, >> return 0; >> } >> >> +/* >> + * Like skip_prefix_mem, but compare case-insensitively. Note that the >> + * comparison is done via tolower(), so it is strictly ASCII (no multi-byte >> + * characters or locale-specific conversions). >> + */ >> +static inline int skip_iprefix_mem(const char *buf, size_t len, >> + const char *prefix, >> + const char **out, size_t *outlen) >> +{ >> + size_t prefix_len = strlen(prefix); >> + >> + if (len < prefix_len) >> + return 0; >> + >> + if (!strncasecmp(buf, prefix, prefix_len)) { >> + *out = buf + prefix_len; >> + *outlen = len - prefix_len; >> + return 1; >> + } >> + >> + return 0; >> +} > > OK. > >> diff --git a/http.c b/http.c >> index 8a5ba3f4776..7a56a3db5f7 100644 >> --- a/http.c >> +++ b/http.c >> @@ -183,6 +183,124 @@ size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_) >> return nmemb; >> } >> >> +/* >> + * A folded header continuation line starts with at least one single whitespace >> + * character. It is not a continuation line if the line is *just* a newline. >> + * The RFC for HTTP states that CRLF is the header field line ending, but some >> + * servers may use LF only; we accept both. >> + */ > > Nice. > >> +static inline int is_hdr_continuation(const char *ptr, const size_t size) >> +{ >> + /* totally empty line or normal header */ >> + if (!size || !isspace(*ptr)) >> + return 0; > > obs-fold (RFC7230) begins the next line with SP or HTAB, but > isspace() allows not just SP and HT but also CR and LF. So > this is a bit pessimistic but rejects what is not a folded > continuation line reliably. > >> + /* empty line with LF line ending */ >> + if (size == 1 && ptr[0] == '\n') >> + return 0; > > And this is a blank line after the headers, with LF (not conforming > but is OK). > >> + /* empty line with CRLF line ending */ >> + if (size == 2 && ptr[0] == '\r' && ptr[1] == '\n') >> + return 0; > > And this is another form of a blank line after the headers, with > CRLF. > >> + return 1; >> +} > > After rejecting the above two "blank", it is a folded continuation > line. OK. > > I wonder if > > static inline int ... () { > return (size && (*ptr == ' ' || *ptr == '\t')); > } > > sufficient and easier to grok, though. You're correct. This implementation is 'more correct' and easier to grok. >> +static size_t fwrite_wwwauth(char *ptr, size_t eltsize, size_t nmemb, void *p) >> +{ >> + size_t size = eltsize * nmemb; >> + struct strvec *values = &http_auth.wwwauth_headers; >> + struct strbuf buf = STRBUF_INIT; >> + const char *val; >> + size_t val_len; >> + >> + /* >> + * Header lines may not come NULL-terminated from libcurl so we must >> + * limit all scans to the maximum length of the header line, or leverage >> + * strbufs for all operations. >> + * >> + * In addition, it is possible that header values can be split over >> + * multiple lines as per RFC 2616 (even though this has since been >> + * deprecated in RFC 7230). A continuation header field value is >> + * identified as starting with a space or horizontal tab. >> + * >> + * The formal definition of a header field as given in RFC 2616 is: >> + * >> + * message-header = field-name ":" [ field-value ] >> + * field-name = token >> + * field-value = *( field-content | LWS ) >> + * field-content = <the OCTETs making up the field-value >> + * and consisting of either *TEXT or combinations >> + * of token, separators, and quoted-string> >> + */ >> + >> + /* Start of a new WWW-Authenticate header */ >> + if (skip_iprefix_mem(ptr, size, "www-authenticate:", &val, &val_len)) { >> + strbuf_add(&buf, val, val_len); >> + >> + /* >> + * Strip the CRLF that should be present at the end of each >> + * field as well as any trailing or leading whitespace from the >> + * value. >> + */ >> + strbuf_trim(&buf); >> + >> + strvec_push(values, buf.buf); >> + http_auth.header_is_last_match = 1; >> + goto exit; > > OK. We remember that we have seen the beginning of a header we are > interested in (so that we can append if it is a continuation we see > next). Good. > >> + } >> + >> + /* >> + * This line could be a continuation of the previously matched header >> + * field. If this is the case then we should append this value to the >> + * end of the previously consumed value. >> + */ >> + if (http_auth.header_is_last_match && is_hdr_continuation(ptr, size)) { >> + /* >> + * Trim the CRLF and any leading or trailing from this line. >> + */ >> + strbuf_add(&buf, ptr, size); >> + strbuf_trim(&buf); >> + >> + /* >> + * At this point we should always have at least one existing >> + * value, even if it is empty. Do not bother appending the new >> + * value if this continuation header is itself empty. >> + */ >> + if (!values->nr) { >> + BUG("should have at least one existing header value"); > > OK, we should set _is_last_match to true only after we recorded the > header that might see a continuation, so it would be a bug if we > didn't have anything there. Good. > >> + } else if (buf.len) { >> + char *prev = xstrdup(values->v[values->nr - 1]); >> + >> + /* Join two non-empty values with a single space. */ >> + const char *const sp = *prev ? " " : ""; >> + >> + strvec_pop(values); >> + strvec_pushf(values, "%s%s%s", prev, sp, buf.buf); >> + free(prev); >> + } >> + >> + goto exit; > > Good that we are prepared to see a logical line split over more than > two lines (i.e. by not toggling _is_last_match off prematurely here). > >> + } >> + >> + /* This is the start of a new header we don't care about */ >> + http_auth.header_is_last_match = 0; > > Or what we just saw and ignored could be a continuation line of a > header we ignored. The comment is slightly misleading. I'll try and reword this to make it more accurate - we have determined this line is not a continuation of the previous WWW-Authenticate header. > Other than that, looking good. > > Thanks.