Re: [PATCH] config: support values longer than 1024 bytes

Erik Faye-Lund <kusmabite@xxxxxxxxx> · Wed, 6 Apr 2011 11:10:42 +0200

On Wed, Apr 6, 2011 at 2:52 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Wed, Apr 06, 2011 at 01:30:03AM +0200, Erik Faye-Lund wrote:
>
>> The rather awkward return statement with strdup("") is because
>> strbuf_detach returns NULL when there's nothing allocated. Even
>> worse, it returns an uninitialized string if the string has been
>> initialized with a non-zero 'hint'.
>
> That seems like two bugs in the strbuf code.
>
> I would expect strbuf_detach to _always_ return an allocated buffer,
> even if it is xstrdup(""). Though the code in strbuf_detach is explicit
> about returning NULL rather than slopbuf, so perhaps there is a case
> where it is useful.

Yeah, this is why I didn't just go for it. Looking through history, it
seems that the reason why it explicitly returns NULL when there's
nothing allocated is that it used to do so "by artifact" before
b315c5c ("strbuf change: be sure ->buf is never ever NULL."), and it
looks like the author didn't want to change the semantics. Fair
enough, changing this takes a lot of code-reading to verify that there
isn't any call-sites that depend on this behavior.

>
> But for the other, one of the invariants of strbuf is that the string is
> always NUL-terminated. So I would expect strbuf_init to properly
> NUL-terminate after growing based on the hint.

I agree. An unterminated yet non-NULL return from strbuf_detach is
just dangerous behavior. Something like this should probably be
applied:

---8<---

diff --git a/strbuf.c b/strbuf.c
index 77444a9..538035a 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -24,14 +24,16 @@ int suffixcmp(const char *str, const char *suffix)
  * buf is non NULL and ->buf is NUL terminated even for a freshly
  * initialized strbuf.
  */
-char strbuf_slopbuf[1];
+char strbuf_slopbuf[1] = { '\0' };

 void strbuf_init(struct strbuf *sb, size_t hint)
 {
 	sb->alloc = sb->len = 0;
 	sb->buf = strbuf_slopbuf;
-	if (hint)
+	if (hint) {
 		strbuf_grow(sb, hint);
+		sb->buf[0] = '\0';
+	}
 }

 void strbuf_release(struct strbuf *sb)
---8<---

>
>>  static char *parse_value(void)
>>  {
>> -     static char value[1024];
>> -     int quote = 0, comment = 0, len = 0, space = 0;
>> +     struct strbuf value = STRBUF_INIT;
>> +     int quote = 0, comment = 0, space = 0;
>
> One thing you could about the strbuf_detach thing is to just use a
> single static strbuf instead of the static array, and just reset its
> length back to zero on each call. So:
>
>  static struct strbuf value = STRBUF_INIT;
>  ...
>  strbuf_reset(&value);
>
>> -                     value[len] = 0;
>> -                     return value;
>> +                     return value.len ?
>> +                         strbuf_detach(&value, NULL) :
>> +                         strdup("");
>
> Then this just becomes:
>
>  return value.buf;
>
>>                       default:
>> +                             strbuf_release(&value);
>>                               return NULL;
>
> And you can drop the release.
>
>> @@ -226,7 +226,9 @@ static int get_value(config_fn_t fn, void *data, char *name, unsigned int len)
>>               if (!value)
>>                       return -1;
>>       }
>> -     return fn(name, value, data);
>> +     ret = fn(name, value, data);
>> +     free(value);
>> +     return ret;
>
> And drop this hunk, since callers no longer need to free.
>

Yeah, and this was what I tried first.

I dropped that approach because I misread the code of strbuf_setlen so
I thought it ended up freeing the buffer (which would have lost the
gain we get from making it static). This is wrong, it does not do
that. But this brings a new issue: leaving potentially huge blocks of
memory (especially since this patch is about long lines) allocated
inside a function can be a bit nasty. But it's probably not a big deal
here, the blocks are probably small enough to "leak" without worries,
unless you have a downright evil config file (which I suspect could
only happen due to disk-corruption, in which case you're going to have
much much bigger problems). And that problem can also be fixed by
giving the buffer file-local scope, and adding a cleanup-function that
gets called when config-parsing is finished. But it's probably not
even worth the hassle.

I guess it's my background from handheld/low-power device development
that makes me worry too much about problems like these :P

In other words: I think you're right, it's a much better approach.
Less allocations, less penalty on the start-up time for every little
git-command.

> I do wonder, though, if we could be reusing the unquote_c_style()
> function in quote.c. They are obviously similar, but I haven't checked
> if there is more going on in the config code.

Hmm, this is an interesting suggestion. It would be a part of a bigger
change though: unquote_c_style requires it's input to be in memory,
while parse_value uses a function called get_next_char to feed the
parser. So we'd either have to read the entire file into memory, or
find some way to read the file line-by-line while handling \n-escaping
correctly.

It also seems like there's differences in what kind of escaping and
normalization the two functions handle; unquote_c_style handles more
escaped character sequences, while parse_value normalize all
non-escaped space-characters ('\t' et. al) into SP. This might not be
such a big problem in reality.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html