Re: [PATCH/RFC 06/10] strbuf: introduce strbuf_split_str_without_term()

Karthik Nayak <karthik.188@xxxxxxxxx> · Thu, 3 Dec 2015 23:42:40 +0530

On Wed, Dec 2, 2015 at 1:34 PM, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote:
> On Wed, Nov 11, 2015 at 2:44 PM, Karthik Nayak <karthik.188@xxxxxxxxx> wrote:
>> The current implementation of 'strbuf_split_buf()' includes the
>> terminator at the end of each strbuf post splitting. Include an option
>
> s/Include an/Add an/
>
>> wherein we can drop the terminator if required. In this context
>
> s/required/desired/
>

will change.

>> introduce a wrapper function 'strbuf_split_str_without_term()' which
>> splits a given string into strbufs without including the terminator.
>>
>> Signed-off-by: Karthik Nayak <Karthik.188@xxxxxxxxx>
>> ---
>> diff --git a/ref-filter.c b/ref-filter.c
>> @@ -892,14 +892,11 @@ static void populate_value(struct ref_array_item *ref)
>>                          * TODO: Implement a function similar to strbuf_split_str()
>>                          * which would omit the separator from the end of each value.
>>                          */
>> -                       s = to_free = strbuf_split_str(valp, ',', 0);
>> +                       s = to_free = strbuf_split_str_without_term(valp, ',', 0);
>>
>>                         align->position = ALIGN_LEFT;
>>
>>                         while (*s) {
>> -                               /*  Strip trailing comma */
>> -                               if (s[1])
>> -                                       strbuf_setlen(s[0], s[0]->len - 1);
>
> I'd prefer to see this ref-filter.c change split out as a separate
> patch so as not to pollute the otherwise single-purpose change
> introduced by this patch (i.e. capability to omit the terminator).
>
> Also, it might make sense to move this patch to the head of the
> series, since it's conceptually distinct from the rest of the patches,
> and could conceivably prove useful on its own, regardless of how the
> rest of the series fares.
>

I guess it makes sense to split this into two separate patches. I'll do that and
push it to the top of the series.

>>                                 if (!strtoul_ui(s[0]->buf, 10, (unsigned int *)&width))
>>                                         ;
>>                                 else if (!strcmp(s[0]->buf, "left"))
>> diff --git a/strbuf.c b/strbuf.c
>> @@ -115,7 +115,7 @@ void strbuf_tolower(struct strbuf *sb)
>>  }
>>
>>  struct strbuf **strbuf_split_buf(const char *str, size_t slen,
>> -                                int terminator, int max)
>> +                                int terminator, int max, int with_term)
>
> "with_term" might undesirably be interpreted as meaning "use this
> particular term". Perhaps a different name, such as "include_term",
> "drop_term", or "omit_term" would be a bit less ambiguous. (I think I
> prefer "omit_term".)
>

True. I too prefer "omit_term" from what you mentioned I'll stick to that.

>>  {
>>         struct strbuf **ret = NULL;
>>         size_t nr = 0, alloc = 0;
>> @@ -123,18 +123,27 @@ struct strbuf **strbuf_split_buf(const char *str, size_t slen,
>>
>>         while (slen) {
>>                 int len = slen;
>> +               int term = with_term;
>
> "term" is not a great variable name, and is easily confused with the
> existing "terminator" input argument. This is really being used as a
> length adjustment, so perhaps a name such as 'term_adjust' or
> 'len_adjust' or something better would be preferable.

'len_adjust' seems better, will change.

>
> Also, since the value of 'with_term' never changes, then 'term' will
> have the same value each time through the loop, thus you could
> (should) hoist the declaration and initialization of 'term' outside of
> the loop.
>

True, will move it out of the loop.

> Due to the way you're using this variable ('term'), you want its value
> always to be 0 or 1 but you don't do anything to ensure that. What if
> the user passes in 42 rather than 0 or 1? That would mess up your
> (below) calculations. Worse, what if the user passes in -42? That
> would be particularly alarming. To turn this into a boolean value (0
> or 1), do this instead:
>
>     int term = !!with_term;

Makes sense, will change.

>
>>                 if (max <= 0 || nr + 1 < max) {
>>                         const char *end = memchr(str, terminator, slen);
>>                         if (end)
>> -                               len = end - str + 1;
>> +                               len = end - str + term;
>> +                       else
>> +                               /*  When no terminator present, we must add the last character */
>> +                               term = 1;
>>                 }
>>                 t = xmalloc(sizeof(struct strbuf));
>>                 strbuf_init(t, len);
>>                 strbuf_add(t, str, len);
>>                 ALLOC_GROW(ret, nr + 2, alloc);
>>                 ret[nr++] = t;
>> -               str += len;
>> -               slen -= len;
>> +               if (!term) {
>> +                       str += len + 1;
>> +                       slen -= len + 1;
>> +               } else {
>> +                       str += len;
>> +                       slen -= len;
>> +               }
>
> This new logic is complex and confusing, thus difficult to review for
> correctness. Rather than messing with 'len' and the existing logic,
> how about instead, just adjusting the amount you store in the strbuf?
> That is, instead of all the above changes, you might be able to get by
> with one little change, something like this (untested):
>
>     - strbuf_add(t, str, len);
>     + strbuf_add(t, str, len - !!end * !!with_term);
>

That seems about right, this should be easier to understand.

>>         }
>>         ALLOC_GROW(ret, nr + 1, alloc); /* In case string was empty */
>>         ret[nr] = NULL;
>> diff --git a/strbuf.h b/strbuf.h
>> @@ -465,19 +465,25 @@ static inline int strbuf_strip_suffix(struct strbuf *sb, const char *suffix)
>>   * For lighter-weight alternatives, see string_list_split() and
>>   * string_list_split_in_place().
>>   */
>> -extern struct strbuf **strbuf_split_buf(const char *, size_t,
>> -                                       int terminator, int max);
>> +extern struct strbuf **strbuf_split_buf(const char *str, size_t slen,
>> +                                       int terminator, int max, int with_term);
>
> You also need to update the comment block above this declaration since
> it still says that each substring includes the terminator. It also
> fails to mention the new 'with_term' argument added by this patch.
>

Will do :)

>> +static inline struct strbuf **strbuf_split_str_without_term(const char *str,
>> +                                                           int terminator, int max)
>
> This is an uncomfortably long function name. Unfortunately, short and
> sweet strbuf_split() is already taken. Perhaps
> strbuf_split_str_drop_term()? strbuf_split_str_omit_term()?
> strbuf_split_str_no_term()? strbuf_split_noterm()?
>

strbuf_split_str_omit_term() would be nice keeping it consistent with the
variable name omit_term.

-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html