On Thu, Sep 08, 2016 at 09:49:38AM +0200, Johannes Schindelin wrote: > > > diff --git a/diff.c b/diff.c > > > index 534c12e..2c5a360 100644 > > > --- a/diff.c > > > +++ b/diff.c > > > @@ -951,7 +951,13 @@ static int find_word_boundaries(mmfile_t *buffer, > > > regex_t *word_regex, > > > { > > > if (word_regex && *begin < buffer->size) { > > > regmatch_t match[1]; > > > - if (!regexec(word_regex, buffer->ptr + *begin, 1, match, > > > 0)) { > > > + int f = 0; > > > +#ifdef REG_STARTEND > > > + match[0].rm_so = 0; > > > + match[0].rm_eo = *end - *begin; > > > + f = REG_STARTEND; > > > +#endif > > > + if (!regexec(word_regex, buffer->ptr + *begin, 1, match, > > > f)) { > > Heh. You introduced the same bug I did. Or maybe you just fetched my > mmap-regexec branch and looked at an intermediate iteration? I do not think I introduced anything. The quoted text is what you sent. Which is perhaps why it has your bug. :) > > But I much prefer this approach to copying the data just to add a NUL. > > I think it is not worth the burden. The only regex implementation in > semi-widespread use that do not support REG_STARTEND seems to be musl. > > I'd rather not spend *so much* effort just to support an obscure platform. > Not when the users of that obscure platform could spend that effort > themselves. And probably won't, because we only copy data to add a NUL on > those platforms when regexec() is called on an mmfile_t. I'm confused about what you think I'm proposing. I was saying I _like_ something like regexec_buf() instead of copying the data. Which seems like the simpler thing to me (and presumably to you). Or do you mean using compat/regex to build on re_search() consistently? I do not think that is all that complex; the question is only whether people really want to use their own regex libraries. Between the two options for regexec_buf(), I think you have convinced me that REG_STARTEND is better than just using compat/regex everywhere. I do think the fallback for platforms like musl should be "use compat/regex" and not doing an expensive copy (which in most cases is not even necessary). -Peff