On Wed, Feb 01, 2012 at 05:14:37PM -0500, Jeff King wrote: > > The first time I introduced this behaviour[1], I made it conditional > > on a preference — those who wanted "good" grep could set the > > preference, while those who wanted "fast" grep could not. I think > > that's not a good idea, though if the performance issues are > > show-stoppers, I'd suggest the opposite preference (so speed-freaks > > can disable the checks). > > I've been able to get somewhat better performance by hoisting the > attribute lookup into the parent thread. That means it happens in order > (which lets the attr code's stack optimizations work), and there's no > lock contention. > > I'll post finished patches with numbers in a few minutes. OK, here they are. After playing with some options, I'm satisfied this is a sane way to do it. I don't think it's worth having a config option. There is a measurable slowdown, but it's simply not that big. [1/2]: grep: let grep_buffer callers specify a binary flag [2/2]: grep: respect diff attributes for binary-ness There are a few optimizations I didn't do that you could put on top: 1. When "-a" is given, we can avoid the attribute lookup altogether. 2. When "-I" is given, we can actually check attributes _before_ loading the file or blob into memory. This can help with very large binaries. 3. When "-I" is given but we have no attribute, we can stream the beginning of the file or blob to check for binary-ness, and then avoid loading the whole thing if it turns out to be binary. I think (1) and (2) should be easy. Doing (3) is a little messier, because binary detection happens inside grep_buffer, but we can hoist it out. However, for large files, it might be nice to have a streaming grep interface anyway, and (3) could be part of that. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html