Re: git status: small difference between stating whole repository and small subdirectory

Thomas Rast <trast@xxxxxxxxxxx> · Wed, 22 Feb 2012 14:17:13 +0100

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> On Sat, Feb 18, 2012 at 5:25 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Jeff King <peff@xxxxxxxx> writes:
>>
>>> That being said, we do have an index extension to store the tree sha1 of
>>> whole directories (i.e., we populate it when we write a whole tree or
>>> subtree into the index from the object db, and it becomes invalidated
>>> when a file becomes modified). This optimization is used by things like
>>> "git commit" to avoid having to recreate the same sub-trees over and
>>> over when creating tree objects from the index. But we could also use it
>>> here to avoid having to even read the sub-tree objects from the object
>>> db.
>>
>> Like b65982b (Optimize "diff-index --cached" using cache-tree, 2009-05-20)
>> perhaps?
>
> This optimizes the case when a cached tree matches entirely.I wonder
> whether it's faster if we switch to tree-tree diff whenever we find
> valid cached trees. If cache-tree is fully valid, "git diff --cached
> foo" would be equivalent to "git diff HEAD foo".

Not necessarily; the cache-tree is valid if it faithfully represents
what is in the index.  It does not have any direct relation to HEAD.

> I tried "git diff --raw HEAD HEAD~100" (where HEAD was
> v3.1-rc1-272-g73e0881 on linux-2.6) and "git diff --cached --raw
> HEAD~100" with no cache-tree. The former is a little bit faster than
> the latter (177ms vs 275ms). On gentoo-x86, 70k worktree files, it's
> 4.33s vs 4.45s. But in tree-tree diff we pay high in cold cache case
> for loading trees from "HEAD". So no, probably not worth more code
> changes. Your optimization is good enough.

I'm still wondering about using mincore() to good effect.  I tried it
for git-grep, but it ended up slowing things down.  However, it irks me
that in some cases a clueful use of one form over the other can really
make a huge performance difference, e.g.,

  git grep stuff
  git grep HEAD stuff

If I am in a big repository that I haven't used in a while, the HEAD
form will be much faster as the worktree search would fault many files.
OTOH if I am in a heavily-used repository (and perhaps just said 'make'
minutes ago) the worktree version will avoid the pack decompression
effort.

Sadly this also has the problem that we must first determine whether
substituting HEAD for the worktree (or vice versa) is valid at all.  For
grep perhaps there could be a "just do a fast search somewhere" option
since usually you are looking for something that hasn't changed in ages.

Ok, that was almost completely beside the point of this thread.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html