Re: crash on git diff-tree -Ganything <tree> for new files with textconv filter

Jeff King <peff@xxxxxxxx> · Tue, 30 Oct 2012 09:12:22 -0400

On Tue, Oct 30, 2012 at 09:46:01PM +0900, Junio C Hamano wrote:

> (1) sounds attractive for more than one reason. In addition to
> avoidance of this issue, it would bring bug-to-bug compatibility
> across platforms.

Yeah. I mentioned breaking the build for people who would now need to
turn on NO_REGEX. But the only reason to do that is to let people on
glibc systems use the system version of the tools. A much saner approach
would be to just always build with our compat regex, and turn NO_REGEX
into a no-op. We already do the same thing for kwset.

> (4), if we can run grep on streaming data (tweak interface we have for
> checking out a large blob to the working tree), would let us work on
> dataset larger than fit in core. Even though it would be much more
> work, it might turn out to be a better option in the longer run.

Agreed, that would be nice. It's potentially a lot of work, but we could
probably get by with a special streaming version of diff_populate_filespec.

The tricky thing is that we have to run the regex matcher progressively
as we stream data in (since your match might fall in the middle of a
read boundary). Which is certainly going to require switching off of
regular regexec. I don't think glibc regex will handle it either,
though. It looks like pcre can report a partial match at the end of the
string, and you can either continue with the next chunk (if using
pcre_dfa) or append and re-start the pattern match (for regular
pcre_exec).

Which means we'd probably have to make streaming matches an optional
feature, and still do (1) first to fix the correctness issue.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html