On Mon, Dec 22, 2014 at 11:44 AM, Yi EungJun <semtlenori@xxxxxxxxx> wrote: > From: Yi EungJun <eungjun.yi@xxxxxxxxxxxxx> > > Add an Accept-Language header which indicates the user's preferred > languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG. > > Examples: > LANGUAGE= -> "" > LANGUAGE=ko:en -> "Accept-Language: ko, en;q=0.9, *;q=0.1" > LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *;q=0.1" > LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *;q=0.1" > > This gives git servers a chance to display remote error messages in > the user's preferred language. > > Limit the number of languages to 1,000 because q-value must not be > smaller than 0.001, and limit the length of Accept-Language header to > 4,000 bytes for some HTTP servers which cannot accept such long header. Just a few comments and observations below. Alone, they are not necessarily worth a re-roll, but if you happen to re-roll for some other reason, perhaps take them into consideration. > Signed-off-by: Yi EungJun <eungjun.yi@xxxxxxxxxxxxx> > --- > diff --git a/http.c b/http.c > index 040f362..7a77708 100644 > --- a/http.c > +++ b/http.c > @@ -986,6 +993,166 @@ static void extract_content_type(struct strbuf *raw, struct strbuf *type, > strbuf_addstr(charset, "ISO-8859-1"); > } > > +static void write_accept_language(struct strbuf *buf) > +{ > + [...] > + /* > + * MAX_LANGS must not be larger than 1,000. If it is larger than that, > + * q-value will be smaller than 0.001, the minimum q-value the HTTP > + * specification allows [1]. > + * > + * [1]: http://tools.ietf.org/html/rfc7231#section-5.3.1 > + */ > + const int MAX_LANGS = 1000; > + const int MAX_SIZE_OF_HEADER = 4000; > + const int MAX_SIZE_OF_ASTERISK_ELEMENT = 11; /* for ", *;q=0.001" */ These two MAX_SIZE_* constants are never used individually, but rather only as (MAX_SIZE_OF_HEADER - MAX_SIZE_OF_ASTERISK_ELEMENT). It might be a bit more readable to compute the final value here, with a suitable comment, rather than at point-of-use. Perhaps something like: /* limit of some HTTP servers is 4000 - strlen(", *;q=0.001") */ const int MAX_HEADER_SIZE = 4000 - 11; More below. > + [...] > + /* > + * Convert a list of colon-separated locale values [1][2] to a list of > + * comma-separated language tags [3] which can be used as a value of > + * Accept-Language header. > + [...] > + */ > + for (pos = lang_begin; ; pos++) { > + if (!*pos || *pos == ':') { > + if (is_q_factor_required) { > + /* Put a q-factor only if it is less than 1.0. */ > + if (q < max_q) > + strbuf_addf(buf, q_format, q); > + > + if (q > 1) > + q--; > + > + last_size = buf->len; > + > + is_q_factor_required = 0; > + } > + parse_state = SEPARATOR; > + } else if (parse_state == CODESET_OR_MODIFIER) > + continue; > + else if (*pos == ' ') /* Ignore whitespace character */ > + continue; > + else if (*pos == '.' || *pos == '@') /* Remove .codeset and @modifier. */ > + parse_state = CODESET_OR_MODIFIER; > + else { > + if (parse_state != LANGUAGE_TAG && q < max_q) > + strbuf_addstr(buf, ", "); > + strbuf_addch(buf, *pos == '_' ? '-' : *pos); > + is_q_factor_required = 1; > + parse_state = LANGUAGE_TAG; > + } > + > + if (buf->len > MAX_SIZE_OF_HEADER - MAX_SIZE_OF_ASTERISK_ELEMENT) { > + strbuf_remove(buf, last_size, buf->len - last_size); > + break; > + } > + > + if (!*pos) > + break; > + } Although often suitable when parsing complex inputs, state machines demand high cognitive load. The input you're parsing, on the other hand, is straightforward and can easily be processed with a simple sequential parser, which is easier to reason about and review for correctness. For instance, something like this: while (*s) { /* collect language tag */ for (; *s && *s != '.' && *s != '@' && *s != ':'; s++) strbuf_addch(buf, *s == '_' ? '-' : *s); /* skip .codeset and @modifier */ while (*s && *s != ':') s++; strbuf_addf(buf, q_format, q); ... other bookkeeping ... if (*s == ':') s++; } This example is intentionally simplified but illustrates the general idea. It lacks comma insertion (left as an exercise for the reader) and empty language tag handling (":en", "en::ko"); and doesn't take whitespace into consideration since it wasn't clear why your v6 parser pays attention to embedded spaces, whereas your earlier versions did not. > + /* Don't add Accept-Language header if no language is preferred. */ > + if (q >= max_q) { > + strbuf_reset(buf); > + return; > + } > + > + /* Add '*' with minimum q-factor greater than 0.0. */ > + strbuf_addstr(buf, ", *"); > + strbuf_addf(buf, q_format, 1); > +} > + > +/* > + * Get an Accept-Language header which indicates user's preferred languages. > + * > + * This function always return non-NULL string as strbuf_detach() does. A couple comments: It's not necessary to explain the public API in terms of an implementation detail. Callers of this function don't care and don't need to know that the value was constructed via strbuf, nor that it is somehow dependent upon the behavior of the underlying implementation of strbuf_detach(). This is a somewhat unusual contract. It's much more common and idiomatic in C to return NULL as an indication of "no preference" (or "failure") than to return an empty string. > + * > + * Examples: > + * LANGUAGE= -> "" > + * LANGUAGE=ko:en -> "Accept-Language: ko, en; q=0.9, *; q=0.1" > + * LANGUAGE=ko_KR.UTF-8:sr@latin -> "Accept-Language: ko-KR, sr; q=0.9, *; q=0.1" > + * LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *; q=0.1" > + * LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *; q=0.1" > + * LANGUAGE= LANG=C -> "" > + */ > +static const char *get_accept_language(void) > +{ > + if (!cached_accept_language) { > + struct strbuf buf = STRBUF_INIT; > + write_accept_language(&buf); > + cached_accept_language = strbuf_detach(&buf, NULL); > + strbuf_release(&buf); Junio already mentioned that strbuf_release() is unnecessary following strbuf_detach(). > + } > + > + return cached_accept_language; > +} > + > /* http_request() targets */ > #define HTTP_REQUEST_STRBUF 0 > #define HTTP_REQUEST_FILE 1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html