Johannes Schindelin <johannes.schindelin@xxxxxx> writes: > This patch series addresses a problem where `git diff` is called using > `-G` or `-S --pickaxe-regex` on new-born files that are configured > without user diff drivers, and that hence get mmap()ed into memory. Good spotting. This has been with us almost forever; I do not think the original pickaxe had it, but I am sure it is broken after the "--pickaxe-regex" enhancement. I am somehow surprised that this is a problem on Windows, though. Wouldn't we be at least running CRLF conversions, and causing diff or grep machinery to work on a NUL-terminated buffer? The convesion code would have to look at mmap'ed memory but I do not think it assumes NUL-termination. Perhaps people on Windows do not usually use straight-through and that is why this was discovered after many years, or something? In any case, that is a digression. > Windows (the bug does not trigger a segmentation fault for me on Linux, > strangely enough, but it is really a problem on Windows). I think it is an issue on all platforms that lets us use mmap(). When the size of a file is multiple of pagesize, the byte past the end of the file can very well fall on an unmapped address. > So at least I have a workaround in place. Ideally, though, we would > NUL-terminate the buffers only when needed, or somehow call regexec() on > ptr/size parameters instead of passing a supposedly NUL-terminated > string to it? I see two reasonable approaches. * We may want to revisit the performance numbers to see if using mmap() to read from the working tree files still buys us much. If not, we should stop borrowing from the working tree using mmap(); instead just slurp in and NUL-terminate it. * We could use <ptr,len> variant of regexp engine as you proposed, which I think is a preferrable solution. Do people know of a widely accepted implementation that we can throw into compat/ as fallback that is compatible with GPLv2?