Thanks very much for your detailed review and sorry for late reply. 2014-07-22 4:01 GMT+09:00 Junio C Hamano <gitster@xxxxxxxxx>: > Yi EungJun <semtlenori@xxxxxxxxx> writes: > >> From: Yi EungJun <eungjun.yi@xxxxxxxxxxxxx> >> >> Add an Accept-Language header which indicates the user's preferred >> languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG. >> >> Examples: >> LANGUAGE= -> "" >> LANGUAGE=ko:en -> "Accept-Language: ko, en; q=0.9, *; q=0.1" >> LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *; q=0.1" >> LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *; q=0.1" >> >> This gives git servers a chance to display remote error messages in >> the user's preferred language. >> >> Signed-off-by: Yi EungJun <eungjun.yi@xxxxxxxxxxxxx> >> --- >> http.c | 134 +++++++++++++++++++++++++++++++++++++++++++++ >> remote-curl.c | 2 + >> t/t5550-http-fetch-dumb.sh | 31 +++++++++++ >> 3 files changed, 167 insertions(+) >> >> diff --git a/http.c b/http.c >> index 3a28b21..ed4e8e1 100644 >> --- a/http.c >> +++ b/http.c >> @@ -67,6 +67,8 @@ static struct curl_slist *no_pragma_header; >> >> static struct active_request_slot *active_queue_head; >> >> +static struct strbuf *cached_accept_language = NULL; > > Please drop " = NULL" that is unnecessary for BSS. Thanks, I'll fix it. > >> @@ -512,6 +514,9 @@ void http_cleanup(void) >> cert_auth.password = NULL; >> } >> ssl_cert_password_required = 0; >> + >> + if (cached_accept_language) >> + strbuf_release(cached_accept_language); >> } > > > >> @@ -983,6 +988,129 @@ static void extract_content_type(struct strbuf *raw, struct strbuf *type, >> strbuf_addstr(charset, "ISO-8859-1"); >> } >> >> +/* >> + * Guess the user's preferred languages from the value in LANGUAGE environment >> + * variable and LC_MESSAGES locale category. >> + * >> + * The result can be a colon-separated list like "ko:ja:en". >> + */ >> +static const char *get_preferred_languages(void) >> +{ >> + const char *retval; >> + >> + retval = getenv("LANGUAGE"); >> + if (retval && *retval) >> + return retval; >> + >> + retval = setlocale(LC_MESSAGES, NULL); >> + if (retval && *retval && >> + strcmp(retval, "C") && >> + strcmp(retval, "POSIX")) >> + return retval; >> + >> + return NULL; >> +} >> + >> +/* >> + * Get an Accept-Language header which indicates user's preferred languages. >> + * >> + * Examples: >> + * LANGUAGE= -> "" >> + * LANGUAGE=ko:en -> "Accept-Language: ko, en; q=0.9, *; q=0.1" >> + * LANGUAGE=ko_KR.UTF-8:sr@latin -> "Accept-Language: ko-KR, sr; q=0.9, *; q=0.1" >> + * LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *; q=0.1" >> + * LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *; q=0.1" >> + * LANGUAGE= LANG=C -> "" >> + */ >> +static struct strbuf *get_accept_language(void) >> +{ >> + const char *lang_begin, *pos; >> + int q, max_q; >> + int num_langs; >> + int decimal_places; >> + int is_codeset_or_modifier = 0; >> + static struct strbuf buf = STRBUF_INIT; >> + struct strbuf q_format_buf = STRBUF_INIT; >> + char *q_format; >> + >> + if (cached_accept_language) >> + return cached_accept_language; >> + >> + lang_begin = get_preferred_languages(); >> + >> + /* Don't add Accept-Language header if no language is preferred. */ >> + if (!(lang_begin && *lang_begin)) { > > It is not wrong per-se, but given how hard get_preferred_languages() > tries not to return a pointer to an empty string, this seems a bit > overly defensive to me. Thanks, I'll fix it. > >> + cached_accept_language = &buf; >> + return cached_accept_language; > > It is somewhat unconventional to have a static pointer outside to > point at a singleton and then have a singleton actually as a static > structure. I would have done without "buf" in this function and > instead started this function like so: > > if (cached_accept_language) > return cached_accept_language; > > cached_accept_language = xmalloc(sizeof(struct strbuf)); > strbuf_init(cached_accept_language, 0); > lang_begin = get_preferred_languages(); > if (!lang_begin) > return cached_accept_language; > Thanks, I'll fix it as you suggest. >> + } >> + >> + /* Count number of preferred lang_begin to decide precision of q-factor */ >> + for (num_langs = 1, pos = lang_begin; *pos; pos++) >> + if (*pos == ':') >> + num_langs++; >> + >> + /* Decide the precision for q-factor on number of preferred lang_begin. */ >> + num_langs += 1; /* for '*' */ > > >> + decimal_places = 1 + (num_langs > 10) + (num_langs > 100); > > What if you got 60000 languages ;-)? I do not think we want to bend > backwards and make the code list all 60000 of them, assigning a > unique and decreasing q value to each of them, forming an overlong > Accept-Language header, but at the same time, I do not think we want > to show nonsense output because we compute the precision incorrectly > here. In that case, less-preferred 59,001 languages will have same q-value of 0.001. Unfortunately, there is no way to represent user's preferences of over 1,000 languages since the HTTP specification restricts the range of q-value from 0.001 to 1.000 [1]. I think this code reflects user's preferences better than Google chrome whose minimum q-value is 0.2 and Mozilla firefox whose minimum q-value is 0.01 [2]. > >> + strbuf_addf(&q_format_buf, "; q=0.%%0%dd", decimal_places); >> + q_format = strbuf_detach(&q_format_buf, NULL); > > q_format_buf is an overkill use of strbuf, isn't it? Just > > char q_format_buf[32]; > sprintf(q_format_buf, ";q=0.%%0%d", decimal_places); > > or something should be more than sufficient, no? Thanks, I'll fix it as you suggest. > >> + for (max_q = 1; decimal_places-- > 0;) max_q *= 10; > > As you have to do one loop like this that amounts to computing log10 > of num_langs, why not compute decimal_places the same way while at > it? It may also make sense to cap the number of languages to avoid > spitting out overly long Accept-Language header with practicaly > useless list of many languages. That is, something along the lines > of ... (note that I may very well have off-by-one or off-by-ten > errors here you may need to tweak to get right): > > if (MAX_LANGS < num_langs) > num_langs = MAX_LANGS; > for (max_q = 1, decimal_places = 1; > max_q < num_langs; > decimal_places++, max_q *= 10) > ; > > If you are to use the MAX_LANGS cap, the main loop would also need > to pay attention to it by breaking out of the loop early before you > reach the end of the string, of course. Thanks for good point. As you said, we should limit the length of the value of Accept-Language header because some HTTP servers respond 4xx Client Error if any header's value is very long (4KB or more) [3]. But MAX_LANGS may not be enough because the header can be too long even if the number of languages does not exceed MAX_LANGS if language tag is too long. Many of language tags are 2-3 characters but some are 11 characters; Even it is possible that a user has a very long custom language tag. I think this problem can be solved by one of these solutions: A. Set MAX_LANGS conservatively (100 or less). B. Limit the length of Accept-Language header directly (4KB or less). C. Negotiate with server; Resend a request with shorter Accept-Language header if the server responds 4xx error. > >> + q = max_q; >> + >> + strbuf_addstr(&buf, "Accept-Language: "); >> + >> + /* >> + * Convert a list of colon-separated locale values [1][2] to a list of >> + * comma-separated language tags [3] which can be used as a value of >> + * Accept-Language header. >> + * >> + * [1]: http://pubs.opengroup.org/onlinepubs/007908799/xbd/envvar.html >> + * [2]: http://www.gnu.org/software/libc/manual/html_node/Using-gettextized-software.html >> + * [3]: http://tools.ietf.org/html/rfc7231#section-5.3.5 >> + */ >> + for (pos = lang_begin; ; pos++) { >> + if (*pos == ':' || !*pos) { >> + /* Ignore if this character is the first one. */ >> + if (pos == lang_begin) >> + continue; > > By doing this "ignore empty" here, but not doing the same when you > count num_langs, are you potentially miscounting num_langs? Yes, num_langs can be larger than actual number of languages. For example, if LANGUAGE="en:" num_langs is 2. I wanted to make the logic to compute num_langs as simple as possible and thought num_langs does not need to be very accurate because it is used only to compute the precision of q-factor (max_q and decimal_places). Do we need to compute num_langs accurately? > >> + is_codeset_or_modifier = 0; >> + >> + /* Put a q-factor only if it is less than 1.0. */ >> + if (q < max_q) > > ... is it the same thing as "do not do this for the first round, but > do so for all the other round"? Yes, q-factor is q/max_q and q-factor of 1.0 is not necessary because: > if no "q" parameter is present, the default weight is 1. > -- http://tools.ietf.org/html/rfc7231#section-5.3.1 > >> + strbuf_addf(&buf, q_format, q); >> + >> + if (q > 1) > > Hmm, I am puzzled. C this ever be an issue (unless you have > off-by-one error or you add "cap num_langs to MAX_LANGS", that is)? Yes, it may be an issue if the number of languages is larger than the max_q. max_q must not be larger than 1000 but the number of languages may be. But this if-statement will be not necessary if we limit the number of languages to 1,000 or less (by MAX_LANGS you suggest). > >> + q--; > >> + /* NULL pos means this is the last language. */ >> + if (*pos) >> + strbuf_addstr(&buf, ", "); >> + else >> + break; >> + >> + } else if (is_codeset_or_modifier) >> + continue; >> + else if (*pos == '.' || *pos == '@') /* Remove .codeset and @modifier. */ >> + is_codeset_or_modifier = 1; >> + else >> + strbuf_addch(&buf, *pos == '_' ? '-' : *pos); >> + } >> + >> + /* Don't add Accept-Language header if no language is preferred. */ >> + if (q >= max_q) { > > Can q go over max_q, or is it "q may be max_q"? In other words, is > this essentially saying "if we did not find any language in the > preferred languages list"? Yes. q may be max_q if we did not find any language in the preferred languages list. But there is no chance that q goes over max_q. But it will be not necessary if num_langs is computed accurately. > >> + cached_accept_language = &buf; >> + return cached_accept_language; >> + } >> + >> + /* Add '*' with minimum q-factor greater than 0.0. */ >> + strbuf_addstr(&buf, ", *"); >> + strbuf_addf(&buf, q_format, 1); >> + >> + cached_accept_language = &buf; >> + return cached_accept_language; >> +} >> + >> /* http_request() targets */ >> #define HTTP_REQUEST_STRBUF 0 >> #define HTTP_REQUEST_FILE 1 >> @@ -995,6 +1123,7 @@ static int http_request(const char *url, >> struct slot_results results; >> struct curl_slist *headers = NULL; >> struct strbuf buf = STRBUF_INIT; >> + struct strbuf* accept_language; > > As we write in C, not C++, our asterisks stick to the variable, not > the type. Thanks, I'll fix it. [1]: http://tools.ietf.org/html/rfc7231#section-5.3.1 [2]: https://hg.mozilla.org/integration/mozilla-inbound/rev/1418f9ce6f8b [3]: http://stackoverflow.com/questions/686217/maximum-on-http-header-values/8623061#8623061 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html