Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > On Sat, Feb 18, 2012 at 5:25 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Jeff King <peff@xxxxxxxx> writes: >> >>> That being said, we do have an index extension to store the tree sha1 of >>> whole directories (i.e., we populate it when we write a whole tree or >>> subtree into the index from the object db, and it becomes invalidated >>> when a file becomes modified). This optimization is used by things like >>> "git commit" to avoid having to recreate the same sub-trees over and >>> over when creating tree objects from the index. But we could also use it >>> here to avoid having to even read the sub-tree objects from the object >>> db. >> >> Like b65982b (Optimize "diff-index --cached" using cache-tree, 2009-05-20) >> perhaps? > > This optimizes the case when a cached tree matches entirely.I wonder > whether it's faster if we switch to tree-tree diff whenever we find > valid cached trees. If cache-tree is fully valid, "git diff --cached > foo" would be equivalent to "git diff HEAD foo". Not necessarily; the cache-tree is valid if it faithfully represents what is in the index. It does not have any direct relation to HEAD. > I tried "git diff --raw HEAD HEAD~100" (where HEAD was > v3.1-rc1-272-g73e0881 on linux-2.6) and "git diff --cached --raw > HEAD~100" with no cache-tree. The former is a little bit faster than > the latter (177ms vs 275ms). On gentoo-x86, 70k worktree files, it's > 4.33s vs 4.45s. But in tree-tree diff we pay high in cold cache case > for loading trees from "HEAD". So no, probably not worth more code > changes. Your optimization is good enough. I'm still wondering about using mincore() to good effect. I tried it for git-grep, but it ended up slowing things down. However, it irks me that in some cases a clueful use of one form over the other can really make a huge performance difference, e.g., git grep stuff git grep HEAD stuff If I am in a big repository that I haven't used in a while, the HEAD form will be much faster as the worktree search would fault many files. OTOH if I am in a heavily-used repository (and perhaps just said 'make' minutes ago) the worktree version will avoid the pack decompression effort. Sadly this also has the problem that we must first determine whether substituting HEAD for the worktree (or vice versa) is valid at all. For grep perhaps there could be a "just do a fast search somewhere" option since usually you are looking for something that hasn't changed in ages. Ok, that was almost completely beside the point of this thread. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html