On Mon, Jun 13, 2011 at 10:30:07AM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > I am tempted to just call this new one strbuf_split and update all > > callers. There aren't that many. > > Yes, that is indeed tempting, and because we have a new parameter the > compiler will catch any new callers that pop up in a mismerge so that > would be perfectly safe. Should we also change the naming later in the series to remain consistent with strbuf_add. IOW, to end up at: struct strbuf **strbuf_split(const char *buf, int len, int delim, int max); struct strbuf **strbuf_split_str(const char *s, int delim, int max); struct strbuf **strbuf_split_buf(const struct strbuf *, int delim, int max); (though I think consistency would also dictate "splitstr" and "splitbuf" without the extra underscore. Personally I find it a bit unreadable). > > -struct strbuf **strbuf_split(const struct strbuf *sb, int delim) > > +struct strbuf **strbuf_split_max(const struct strbuf *sb, int delim, int max) > > { > > int alloc = 2, pos = 0; > > char *n, *p; > > @@ -114,7 +114,10 @@ struct strbuf **strbuf_split(const struct strbuf *sb, int delim) > > p = n = sb->buf; > > while (n < sb->buf + sb->len) { > > int len; > > - n = memchr(n, delim, sb->len - (n - sb->buf)); > > + if (max <= 0 || pos + 1 < max) > > + n = memchr(n, delim, sb->len - (n - sb->buf)); > > + else > > + n = NULL; > > if (pos + 1 >= alloc) { > > alloc = alloc * 2; > > ret = xrealloc(ret, sizeof(struct strbuf *) * alloc); > > Hmm, even when we know the value of max, we go exponential, and even do so > by hand without using ALLOC_GROW(). Somewhat sad. Thanks for reminding me. I noticed it wasn't using ALLOC_GROW, but decided not to change it because I wanted to introduce an optimization later on not to grow beyond max. But then I forgot. :) The optimization I was going to do was to simply allocate "max" slots at the beginning (if it's defined). You know you can't grow beyond that, and in most splits with a max, the caller is expecting all of them to be filled. But your two-pass patch below is also reasonable. > Also do we currently rely on the bug that strbuf_split() returns (NULL,) > instead of ("", NULL) when given an empty string? If not, perhaps... I assumed that behavior was not a bug (and even had to avoid a segfault with it in a later series, as you saw). But thinking on it more, it really is one; splitting even a single character without delimiter ends up with a non-NULL portion, and I think the empty string should do the same. > strbuf.c | 50 +++++++++++++++++++++++++++++++------------------- > 1 files changed, 31 insertions(+), 19 deletions(-) I think your patch looks reasonable. In theory doing two passes over a very large buffer (e.g., splitting lines from a large commit message) might be slightly less efficient, but I imagine it is drowned out in the noise of malloc'ing strbufs. > + for (pass = 0; pass < 2; pass++) { > + /* First pass counts, second pass allocates and fills */ Maybe it is just me, but I tend not to like writing multi-pass stuff like this as a for-loop, but instead to factor it into a function with an "actually allocate" parameter. I find it makes the code much more obvious. > + if (!count) { > t = xmalloc(sizeof(struct strbuf)); > - strbuf_init(t, len); > - strbuf_add(t, p, len); > - ret[pos] = t; > - ret[++pos] = NULL; > - p = ++n; > + strbuf_init(t, 0); > + ret[0] = t; > } I think my test in 4/10 (which avoids the segfault by checking explicitly for NULL in the caller) should go with this part, and then 4/10 can go away. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html