On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote: > On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote: > > > The patch below fixes it, but it's terribly inefficient (it just detects > > the situation and reallocates). It would be much better to disable the > > reuse_worktree_file mmap when we populate the filespec, but it is too > > late to pass an option; we may have already populated from an earlier > > diffcore stage. > > > > I guess if we teach the whole diff code that "-G" (and --pickaxe-regex) > > is brittle, we can disable the optimization from the beginning based on > > the diff options. I'll take a look. > > Hmm. That is problematic for two reasons. > > 1. The whole diff call chain will have to be modified to pass the > options around, so they can make it down to the > diff_populate_filespec level. Alternatively, we could do some kind > of global hack, which is ugly but would work OK in practice. > > 2. Reusing a working tree file is only half of the reason a filespec > might be mmap'd. It might also be because we are literally diffing > the working tree. "-G" was meant to be used to limit log traversal, > but it also works to reduce the diff output for something like "git > diff HEAD^". > > I really wish there were an alternate regexec interface we could use > that took a pointer/size pair. Bleh. Thinking on it more, my patch, hacky thought it seems, may not be the worst solution. Here are the options that I see: 1. Use a regex library that does not require NUL termination. If we are bound by the regular regexec interface, this is not feasible. But the GNU implementation works on arbitrary-length buffers (you just have to use a slightly different interface), and we already carry it in compat. It would mean platforms which provide a working but non-GNU regexec would have to start defining NO_REGEX. 2. Figure out a way to get one extra zero byte via mmap. If the requested size does not fall on a page boundary, you get extra zero-ed bytes. Unfortunately, requesting an extra byte does not do what we want; you get SIGBUS accessing it. 3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That way we only incur the cost when we need it. 4. Avoid mmap-ing in the first place when we are using -G or --pickaxe-regex (e.g., by doing a big read()). At first glance, this sounds more efficient than loading the data one way and then making another copy. But mmap+memcpy, aside from the momentary doubled memory requirement, is probably just as fast or faster than calling read() repeatedly. I am really tempted by (1). Given that (2) does not work, unless somebody comes up with something clever there, that would make (3) the next best choice. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html