Re: [PATCH 0/9] respect binary attribute in grep

Pete Wyckoff <pw@xxxxxxxx> · Sat, 4 Feb 2012 14:22:52 -0500

I took a look at this series.  It's nice.  My worry was that the
extra open() of non-existent .gitattributes files in all the
directories would cause performance problems across networked
filesystems like NFS.

My usual (non-public) repository has order:

    100k files
     10k directories

and no files marked as binary.  The grep string is such that it
is disk-bound, and not expected to match in any file (or binary):
"time ~/src/git/bin-wrappers/git grep unfindable-string".

With your change, there are 10k new open() calls looking for
.gitattributes in each directory, all of which return ENOENT.
This turns out to have an insignificant impact on performance due
to the much bigger time sink of stat()-ing all the files.

I think this happens to be true because the gitattributes lookups
run in parallel to all the file stat work, as the main thread
dispatches file work while doing its own gitattributes lookups.

It could be plausible that deep directory structures with few
grep-able files will suffer with this change.  For example, many
big binary blobs in deep directory hierarchies, but also some
useful files here and there.

One could argue that with the use of .gitattributes to specify
which blobs should not be searched, this series makes this faster
by not having to to read the binary blobs at all.  And I'd be
okay with that.

Just FYI that there may be a performance impact on certain
repositories.

		-- Pete
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html