Re: [PATCH v6 0/1] http: Add Accept-Language header if possible

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Mon, 19 Jan 2015 15:21:16 -0500

On Sunday, January 18, 2015, Yi EungJun <semtlenori@xxxxxxxxx> wrote:
> Add an Accept-Language header which indicates the user's preferred
> languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG.
>
> Examples:
>   LANGUAGE= -> ""
>   LANGUAGE=ko:en -> "Accept-Language: ko, en;q=0.9, *;q=0.1"
>   LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *;q=0.1"
>   LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *;q=0.1"
>
> This gives git servers a chance to display remote error messages in
> the user's preferred language.
>
> Limit the number of languages to 1,000 because q-value must not be
> smaller than 0.001, and limit the length of Accept-Language header to
> 4,000 bytes for some HTTP servers which cannot accept such long header.
>
> Signed-off-by: Yi EungJun <eungjun.yi@xxxxxxxxxxxxx>
> ---
> diff --git a/http.c b/http.c
> index 040f362..349b033 100644
> --- a/http.c
> +++ b/http.c
> @@ -986,6 +993,145 @@ static void extract_content_type(struct strbuf *raw, struct strbuf *type,
>                 strbuf_addstr(charset, "ISO-8859-1");
>  }
>
> +static void write_accept_language(struct strbuf *buf)
> +{
> +       [...]
> +       const int MAX_DECIMAL_PLACES = 3;
> +       const int MAX_LANGUAGE_TAGS = 1000;
> +       const int MAX_ACCEPT_LANGUAGE_HEADER_SIZE = 4000;
> +       struct strbuf *language_tags = NULL;
> +       int num_langs;

Mental note: 'num_langs' is not initialized.

> +       const char *s = get_preferred_languages();
> +
> +       /* Don't add Accept-Language header if no language is preferred. */
> +       if (!s)
> +               return;
> +
> +       /*
> +        * Split the colon-separated string of preferred languages into
> +        * language_tags array.
> +        */
> +       do {
> +               /* increase language_tags array to add new language tag */
> +               REALLOC_ARRAY(language_tags, num_langs + 1);

'num_langs' is used here uninitialized.

> +               strbuf_init(&language_tags[num_langs], 0);
> +
> +               /* collect language tag */
> +               for (; *s && (isalnum(*s) || *s == '_'); s++)
> +                       strbuf_addch(&language_tags[num_langs], *s == '_' ? '-' : *s);
> +
> +               /* skip .codeset, @modifier and any other unnecessary parts */
> +               while (*s && *s != ':')
> +                       s++;
> +
> +               if (language_tags[num_langs].len > 0) {

Mental note: An empty ("") language tag is never allowed in language_tags[].

 > +                       num_langs++;

This is a little bit ugly. At the top of the loop, you allocate space
in the array for a strbuf and initialize it. However, if the language
tag is empty (""), then 'num_langs' is never incremented, so the next
time through the loop, strbuf_init() is invoked on the same block of
memory (assuming the realloc was a no-op since the allocation size did
not change), overwriting whatever was there and possibly leaking
memory. In this particular case, by examining the parser code closely,
we can see that nothing was added to the strbuf, so nothing is being
leaked the next time around, given the current implementation of
strbuf.

However, this is potentially fragile. A change to the implementation
of strbuf in the future (for instance, if strbuf_init() allocates
memory immediately) could result in a leak here. Moreover, this
no-leak situation only holds true if no text at all has been added to
the strbuf after strbuf_init(). If someone changes the parser in the
future to operate a bit differently so that some text is added and
then removed from the strbuf, even though the end result still has
length is 0, then it will start leaking.

One way to make this more robust would be to have a separate strbuf
for collecting the language tag. When you encounter a non-empty tag,
only then grow the array and initialize the new strbuf in the array.
Finally, use strbuf_swap() to swap the collected language tag into the
new array position. Something like this:

    struct strbuf tag = STRBUF_INIT;
    do {
         for (; *s && (isalnum(*s) || *s == '_'); s++)
             strbuf_addch(&tag, *s == '_' ? '-' : *s);

        [...]

        if (tag.len) {
            num_langs++;
            REALLOC_ARRAY(language_tags, num_langs);
            strbuf_init(&language_tags[num_langs], 0);
            strbuf_swap(&tag, &language_tags[num_langs]);

            if (num_langs >= ...)
                break;
        }
    while (...);
    strbuf_release(&tag);

> +                       if (num_langs >= MAX_LANGUAGE_TAGS - 1) /* -1 for '*' */
> +                               break;
> +               }
> +       } while (*s++);
> +
> +       /* write Accept-Language header into buf */
> +       if (num_langs >= 1) {
> +               int i;
> +               int last_buf_len;

Mental note: 'last_buf_len' is not initialized.

> +               int max_q;
> +               int decimal_places;
> +               char q_format[32];
> +
> +               /* add '*' */
> +               REALLOC_ARRAY(language_tags, num_langs + 1);
> +               strbuf_init(&language_tags[num_langs], 0);
> +               strbuf_addstr(&language_tags[num_langs++], "*");
> +
> +               /* compute decimal_places */
> +               for (max_q = 1, decimal_places = 0;
> +                               max_q < num_langs && decimal_places <= MAX_DECIMAL_PLACES;
> +                               decimal_places++, max_q *= 10)
> +                       ;
> +
> +               sprintf(q_format, ";q=0.%%0%dd", decimal_places);
> +
> +               strbuf_addstr(buf, "Accept-Language: ");
> +
> +               for(i = 0; i < num_langs; i++) {
> +                       if (language_tags[i].len == 0)
> +                               continue;

The parsing code does not allow empty tags ("") in language_tags[], so
this conditional is useless, isn't it?

> +
> +                       if (i > 0)
> +                               strbuf_addstr(buf, ", ");
> +
> +                       strbuf_addstr(buf, strbuf_detach(&language_tags[i], NULL));

This leaks the string detached from 'language_tag[i]' since
strbuf_addstr() does not take ownership of it.

> +                       if (i > 0)
> +                               strbuf_addf(buf, q_format, max_q - i);
> +
> +                       if (buf->len > MAX_ACCEPT_LANGUAGE_HEADER_SIZE) {
> +                               strbuf_remove(buf, last_buf_len, buf->len - last_buf_len);

'last_buf_len' is (potentially) used here uninitialized the first time
through loop.

> +                               break;
> +                       }
> +
> +                       last_buf_len = buf->len;
> +               }
> +       }
> +
> +       free(language_tags);

This _seems_ to be okay since strbuf_detach() was invoked for each
strbuf in language_tags[], however, those strings were in fact leaked
(as noted above), so it's not actually correct.

> +}
> +
> +/*
> + * Get an Accept-Language header which indicates user's preferred languages.
> + *
> + * This function always return non-NULL string as strbuf_detach() does.

Repeating from [1]: It's not good form to describe the published API
in terms of an implementation detail (strbuf_detach). Also, it would
be more idiomatic in C to return NULL rather than empty string.

[1]: http://article.gmane.org/gmane.comp.version-control.git/261810/

> + *
> + * Examples:
> + *   LANGUAGE= -> ""
> + *   LANGUAGE=ko:en -> "Accept-Language: ko, en; q=0.9, *; q=0.1"
> + *   LANGUAGE=ko_KR.UTF-8:sr@latin -> "Accept-Language: ko-KR, sr; q=0.9, *; q=0.1"
> + *   LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *; q=0.1"
> + *   LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *; q=0.1"
> + *   LANGUAGE= LANG=C -> ""
> + */
> +static const char *get_accept_language(void)
> +{
> +       if (!cached_accept_language) {
> +               struct strbuf buf = STRBUF_INIT;
> +               write_accept_language(&buf);
> +               cached_accept_language = strbuf_detach(&buf, NULL);
> +       }
> +
> +       return cached_accept_language;
> +}
> +
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html