On Fri, Jul 22, 2016 at 10:44:08AM -0700, Junio C Hamano wrote: > > diff --git a/diff.c b/diff.c > > index 7d03419..b43d3dd 100644 > > --- a/diff.c > > +++ b/diff.c > > @@ -2683,6 +2683,13 @@ static int reuse_worktree_file(const char *name, const unsigned char *sha1, int > > if (!FAST_WORKING_DIRECTORY && !want_file && has_sha1_pack(sha1)) > > return 0; > > > > + /* > > + * Similarly, if we'd have to convert the file contents anyway, that > > + * makes the optimization not worthwhile. > > + */ > > + if (!want_file && would_convert_to_git(name)) > > + return 0; > > The would_convert_to_git() function is not a free operation. It > needs to prime the attribute stack, so it needs to open/read/parse a > few files ($GIT_DIR/info/attributes and .gitattributes at least, and > more if your directory hierarchy is deep) on the filesystem. The > cost is amortized across paths, but we do not even enable the > optimization if we have to pay the cost of reading the index > ourselves. Yeah, I almost commented on that, and its position in the function, but forgot to. The only code path which will trigger this is diff_populate_filespec. After reuse_worktree_file() says "yes, we can reuse", we drop into a conditional that will end in us calling convert_to_git() anyway, which will do the same lookup. We don't need to cache, because the expensive parts of the attribute-lookup are already cached for us by the attribute code. So my initial thought was to put it at the end of reuse_worktree_file(), where it would have the least impact. If we find we cannot reuse the file, then we would skip both this new attr lookup and the one in diff_populate_filespec(). But in practice, I think we'll already have cached those attrs before we even hit this function, because we'll hit the userdiff_find_by_path() code earlier in the diff process (e.g., to see if it's binary, if we need to textconv, etc). Those look for different attributes, but I think the expensive bits (finding, opening, reading attribute files) are cached across all lookups. So I think we actually _can_ think of would_convert_to_git() as basically free. Or as free as other efficient-lookup functions we call like cache_name_pos(). And so I moved it further up in the function, where it lets us avoid doing more out-of-process work (like calling lstat() so we can ce_match_stat() on the result). Possibly it should go after the cache_name_pos() call, though. That's likely to be less expensive than the actual walk of the attr tree. > I suspect that we may be better off disabling this optimization if > we need to always call the helper. The thought "does this tree reuse even speed things up enough to justify its complexity" definitely crossed my mind. It's basically swapping open/mmap for zlib inflating the content. But I do think it helps in the "want_file" case (i.e., when we are writing out a tempfile for an external command via prepare_temp_file()). There it helps us omit writing a tempfile to disk entirely, including any possible conversion. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html