Re: [PATCH] Don't search files with an unset "grep" attribute

Jeff King <peff@xxxxxxxx> · Wed, 1 Feb 2012 18:20:27 -0500

On Wed, Feb 01, 2012 at 05:14:37PM -0500, Jeff King wrote:

> > The first time I introduced this behaviour[1], I made it conditional
> > on a preference — those who wanted "good" grep could set the
> > preference, while those who wanted "fast" grep could not. I think
> > that's not a good idea, though if the performance issues are
> > show-stoppers, I'd suggest the opposite preference (so speed-freaks
> > can disable the checks).
> 
> I've been able to get somewhat better performance by hoisting the
> attribute lookup into the parent thread. That means it happens in order
> (which lets the attr code's stack optimizations work), and there's no
> lock contention.
> 
> I'll post finished patches with numbers in a few minutes.

OK, here they are. After playing with some options, I'm satisfied this
is a sane way to do it. I don't think it's worth having a config option.
There is a measurable slowdown, but it's simply not that big.

  [1/2]: grep: let grep_buffer callers specify a binary flag
  [2/2]: grep: respect diff attributes for binary-ness

There are a few optimizations I didn't do that you could put on top:

  1. When "-a" is given, we can avoid the attribute lookup altogether.

  2. When "-I" is given, we can actually check attributes _before_
     loading the file or blob into memory. This can help with very large
     binaries.

  3. When "-I" is given but we have no attribute, we can stream the
     beginning of the file or blob to check for binary-ness, and then
     avoid loading the whole thing if it turns out to be binary.

I think (1) and (2) should be easy. Doing (3) is a little messier,
because binary detection happens inside grep_buffer, but we can hoist it
out. However, for large files, it might be nice to have a streaming grep
interface anyway, and (3) could be part of that.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html