On Sat, Feb 04, 2012 at 02:22:52PM -0500, Pete Wyckoff wrote: > I took a look at this series. It's nice. My worry was that the > extra open() of non-existent .gitattributes files in all the > directories would cause performance problems across networked > filesystems like NFS. Yeah, I was able to measure a small slow-down on a quick grep even with a warm cache. So it does take some extra effort, but I think the correctness is worth it (and note that the slow down is in the tens of milliseconds if you have a reasonable stat()). If people have big trees on NFS (or some other slow-stat system) where these lookups are actually a problem, I'd rather see a global option to disable .gitattributes lookups for both diff and grep (i.e., a "trust me, I'm not using gitattributes, and don't bother with stat" flag). In practice, though, I think such a thing is not necessary because the stat() is local to the file being examined (e.g., for "foo/bar/baz", we look only at "foo/bar/.gitattributes", "foo/.gitattributes", and ".gitattributes", without having to touch other parts of the tree). Anyway, thanks for doing some performance testing. More data is always good. > It could be plausible that deep directory structures with few > grep-able files will suffer with this change. For example, many > big binary blobs in deep directory hierarchies, but also some > useful files here and there. > > One could argue that with the use of .gitattributes to specify > which blobs should not be searched, this series makes this faster > by not having to to read the binary blobs at all. And I'd be > okay with that. Yes, exactly. I think this will end up being a big win for such cases, because the cost of loading even one large binary file from disk will dwarf all of the stats. But it does depend on people marking their binaries and using "-I". -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html