On Wed, Apr 6, 2011 at 2:52 AM, Jeff King <peff@xxxxxxxx> wrote: > On Wed, Apr 06, 2011 at 01:30:03AM +0200, Erik Faye-Lund wrote: > >> The rather awkward return statement with strdup("") is because >> strbuf_detach returns NULL when there's nothing allocated. Even >> worse, it returns an uninitialized string if the string has been >> initialized with a non-zero 'hint'. > > That seems like two bugs in the strbuf code. > > I would expect strbuf_detach to _always_ return an allocated buffer, > even if it is xstrdup(""). Though the code in strbuf_detach is explicit > about returning NULL rather than slopbuf, so perhaps there is a case > where it is useful. Yeah, this is why I didn't just go for it. Looking through history, it seems that the reason why it explicitly returns NULL when there's nothing allocated is that it used to do so "by artifact" before b315c5c ("strbuf change: be sure ->buf is never ever NULL."), and it looks like the author didn't want to change the semantics. Fair enough, changing this takes a lot of code-reading to verify that there isn't any call-sites that depend on this behavior. > > But for the other, one of the invariants of strbuf is that the string is > always NUL-terminated. So I would expect strbuf_init to properly > NUL-terminate after growing based on the hint. I agree. An unterminated yet non-NULL return from strbuf_detach is just dangerous behavior. Something like this should probably be applied: ---8<--- diff --git a/strbuf.c b/strbuf.c index 77444a9..538035a 100644 --- a/strbuf.c +++ b/strbuf.c @@ -24,14 +24,16 @@ int suffixcmp(const char *str, const char *suffix) * buf is non NULL and ->buf is NUL terminated even for a freshly * initialized strbuf. */ -char strbuf_slopbuf[1]; +char strbuf_slopbuf[1] = { '\0' }; void strbuf_init(struct strbuf *sb, size_t hint) { sb->alloc = sb->len = 0; sb->buf = strbuf_slopbuf; - if (hint) + if (hint) { strbuf_grow(sb, hint); + sb->buf[0] = '\0'; + } } void strbuf_release(struct strbuf *sb) ---8<--- > >> static char *parse_value(void) >> { >> - static char value[1024]; >> - int quote = 0, comment = 0, len = 0, space = 0; >> + struct strbuf value = STRBUF_INIT; >> + int quote = 0, comment = 0, space = 0; > > One thing you could about the strbuf_detach thing is to just use a > single static strbuf instead of the static array, and just reset its > length back to zero on each call. So: > > static struct strbuf value = STRBUF_INIT; > ... > strbuf_reset(&value); > >> - value[len] = 0; >> - return value; >> + return value.len ? >> + strbuf_detach(&value, NULL) : >> + strdup(""); > > Then this just becomes: > > return value.buf; > >> default: >> + strbuf_release(&value); >> return NULL; > > And you can drop the release. > >> @@ -226,7 +226,9 @@ static int get_value(config_fn_t fn, void *data, char *name, unsigned int len) >> if (!value) >> return -1; >> } >> - return fn(name, value, data); >> + ret = fn(name, value, data); >> + free(value); >> + return ret; > > And drop this hunk, since callers no longer need to free. > Yeah, and this was what I tried first. I dropped that approach because I misread the code of strbuf_setlen so I thought it ended up freeing the buffer (which would have lost the gain we get from making it static). This is wrong, it does not do that. But this brings a new issue: leaving potentially huge blocks of memory (especially since this patch is about long lines) allocated inside a function can be a bit nasty. But it's probably not a big deal here, the blocks are probably small enough to "leak" without worries, unless you have a downright evil config file (which I suspect could only happen due to disk-corruption, in which case you're going to have much much bigger problems). And that problem can also be fixed by giving the buffer file-local scope, and adding a cleanup-function that gets called when config-parsing is finished. But it's probably not even worth the hassle. I guess it's my background from handheld/low-power device development that makes me worry too much about problems like these :P In other words: I think you're right, it's a much better approach. Less allocations, less penalty on the start-up time for every little git-command. > I do wonder, though, if we could be reusing the unquote_c_style() > function in quote.c. They are obviously similar, but I haven't checked > if there is more going on in the config code. Hmm, this is an interesting suggestion. It would be a part of a bigger change though: unquote_c_style requires it's input to be in memory, while parse_value uses a function called get_next_char to feed the parser. So we'd either have to read the entire file into memory, or find some way to read the file line-by-line while handling \n-escaping correctly. It also seems like there's differences in what kind of escaping and normalization the two functions handle; unquote_c_style handles more escaped character sequences, while parse_value normalize all non-escaped space-characters ('\t' et. al) into SP. This might not be such a big problem in reality. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html